Buying and selling used phones and tablets used to be something that happened on a handful of online marketplace sites. But the used and refurbished device market has grown considerably over the past decade, and a new IDC (International Data Corporation) forecast predicts that the used phone market would be worth \$52.7bn by 2023 with a compound annual growth rate (CAGR) of 13.6% from 2018 to 2023. This growth can be attributed to an uptick in demand for used phones and tablets that offer considerable savings compared with new models.
Refurbished and used devices continue to provide cost-effective alternatives to both consumers and businesses that are looking to save money when purchasing one. There are plenty of other benefits associated with the used device market. Used and refurbished devices can be sold with warranties and can also be insured with proof of purchase. Third-party vendors/platforms, such as Verizon, Amazon, etc., provide attractive offers to customers for refurbished devices. Maximizing the longevity of devices through second-hand trade also reduces their environmental impact and helps in recycling and reducing waste. The impact of the COVID-19 outbreak may further boost this segment as consumers cut back on discretionary spending and buy phones and tablets only for immediate needs.
The rising potential of this comparatively under-the-radar market fuels the need for an ML-based solution to develop a dynamic pricing strategy for used and refurbished devices. ReCell, a startup aiming to tap the potential in this market, has hired you as a data scientist. They want you to analyze the data provided and build a linear regression model to predict the price of a used phone/tablet and identify factors that significantly influence it.
The data contains the different attributes of used/refurbished phones and tablets. The data was collected in the year 2021. The detailed data dictionary is given below.
Data Dictionary
# Libraries to help with reading and manipulating data
import numpy as np
import pandas as pd
# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# to split the data into train and test
from sklearn.model_selection import train_test_split
# to build linear regression_model
import statsmodels.api as sm
# to check model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
data = pd.read_csv('used_device_data.csv')
data.shape
(3454, 15)
Dataset contains 3454 rows and 15 columns
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3454 entries, 0 to 3453 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 brand_name 3454 non-null object 1 os 3454 non-null object 2 screen_size 3454 non-null float64 3 4g 3454 non-null object 4 5g 3454 non-null object 5 main_camera_mp 3275 non-null float64 6 selfie_camera_mp 3452 non-null float64 7 int_memory 3450 non-null float64 8 ram 3450 non-null float64 9 battery 3448 non-null float64 10 weight 3447 non-null float64 11 release_year 3454 non-null int64 12 days_used 3454 non-null int64 13 normalized_used_price 3454 non-null float64 14 normalized_new_price 3454 non-null float64 dtypes: float64(9), int64(2), object(4) memory usage: 404.9+ KB
Most of the datatypes are float with 4 object and 2 integer columns. It appears not all the rows are the same length, so adjusting for missing data will be addressed later
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| screen_size | 3454.0 | 13.713115 | 3.805280 | 5.080000 | 12.700000 | 12.830000 | 15.340000 | 30.710000 |
| main_camera_mp | 3275.0 | 9.460208 | 4.815461 | 0.080000 | 5.000000 | 8.000000 | 13.000000 | 48.000000 |
| selfie_camera_mp | 3452.0 | 6.554229 | 6.970372 | 0.000000 | 2.000000 | 5.000000 | 8.000000 | 32.000000 |
| int_memory | 3450.0 | 54.573099 | 84.972371 | 0.010000 | 16.000000 | 32.000000 | 64.000000 | 1024.000000 |
| ram | 3450.0 | 4.036122 | 1.365105 | 0.020000 | 4.000000 | 4.000000 | 4.000000 | 12.000000 |
| battery | 3448.0 | 3133.402697 | 1299.682844 | 500.000000 | 2100.000000 | 3000.000000 | 4000.000000 | 9720.000000 |
| weight | 3447.0 | 182.751871 | 88.413228 | 69.000000 | 142.000000 | 160.000000 | 185.000000 | 855.000000 |
| release_year | 3454.0 | 2015.965258 | 2.298455 | 2013.000000 | 2014.000000 | 2015.500000 | 2018.000000 | 2020.000000 |
| days_used | 3454.0 | 674.869716 | 248.580166 | 91.000000 | 533.500000 | 690.500000 | 868.750000 | 1094.000000 |
| normalized_used_price | 3454.0 | 4.364712 | 0.588914 | 1.536867 | 4.033931 | 4.405133 | 4.755700 | 6.619433 |
| normalized_new_price | 3454.0 | 5.233107 | 0.683637 | 2.901422 | 4.790342 | 5.245892 | 5.673718 | 7.847841 |
data.sample(
10, random_state=2
)
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1207 | HTC | Android | 12.70 | no | no | 8.0 | 5.0 | 32.0 | 4.0 | 2100.0 | 160.00 | 2014 | 1044 | 4.258446 | 5.247287 |
| 143 | Samsung | Android | 20.32 | yes | no | 5.0 | 2.0 | 16.0 | 1.5 | 4200.0 | 313.00 | 2015 | 909 | 4.326117 | 5.348202 |
| 3202 | Huawei | Android | 16.69 | yes | yes | 10.5 | 16.0 | 128.0 | 8.0 | 4000.0 | 178.00 | 2020 | 221 | 4.856474 | 6.152456 |
| 1127 | Honor | Android | 10.29 | no | no | 8.0 | 2.0 | 32.0 | 4.0 | 1730.0 | 159.00 | 2015 | 894 | 3.883006 | 4.248209 |
| 2926 | Xiaomi | Android | 14.83 | yes | no | 13.0 | 8.0 | 64.0 | 4.0 | 3000.0 | 150.00 | 2018 | 446 | 4.661551 | 5.131672 |
| 1726 | LG | Android | 7.67 | no | no | 2.0 | 1.3 | 16.0 | 4.0 | 1540.0 | 140.45 | 2014 | 952 | 3.384051 | 4.116595 |
| 62 | Infinix | Android | 15.39 | yes | no | NaN | 16.0 | 32.0 | 3.0 | 4000.0 | 178.00 | 2019 | 316 | 4.555244 | 4.602166 |
| 3273 | Sony | Android | 15.37 | yes | yes | NaN | 8.0 | 256.0 | 8.0 | 4000.0 | 181.40 | 2020 | 236 | 5.483427 | 6.839637 |
| 1411 | Others | Others | 19.94 | no | no | 5.0 | 2.0 | 32.0 | 4.0 | 4450.0 | 384.00 | 2014 | 767 | 4.368054 | 5.140552 |
| 551 | Alcatel | Android | 10.29 | no | no | 5.0 | 0.3 | 16.0 | 4.0 | 1800.0 | 157.00 | 2013 | 921 | 3.937496 | 4.603669 |
df = data.copy()
cols = ['brand_name', 'os', '4g', '5g']
df[cols] = df[cols].astype('category')
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3454 entries, 0 to 3453 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 brand_name 3454 non-null category 1 os 3454 non-null category 2 screen_size 3454 non-null float64 3 4g 3454 non-null category 4 5g 3454 non-null category 5 main_camera_mp 3275 non-null float64 6 selfie_camera_mp 3452 non-null float64 7 int_memory 3450 non-null float64 8 ram 3450 non-null float64 9 battery 3448 non-null float64 10 weight 3447 non-null float64 11 release_year 3454 non-null int64 12 days_used 3454 non-null int64 13 normalized_used_price 3454 non-null float64 14 normalized_new_price 3454 non-null float64 dtypes: category(4), float64(9), int64(2) memory usage: 312.2 KB
df.duplicated().sum()
0
Copy the original data dataframe into a new dataframe ensuring no duplication and change all object columns to category datatype.
df.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 179 selfie_camera_mp 2 int_memory 4 ram 4 battery 6 weight 7 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
Several columns appear to be missing values. Main Camera MP has the most missing values
Questions:
# Univariate Analysis
def histogram_boxplot(data, feature, figsize=(12, 7), kde=False, bins=None):
"""
Boxplot and histogram combined
data: dataframe
feature: dataframe column
figsize: size of figure (default (12,7))
kde: whether to the show density curve (default False)
bins: number of bins for histogram (default None)
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2, # Number of rows of the subplot grid= 2
sharex=True, # x-axis will be shared among all subplots
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
) # creating the 2 subplots
sns.boxplot(
data=data, x=feature, ax=ax_box2, showmeans=True, color="violet"
) # boxplot will be created and a star will indicate the mean value of the column
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2
) # For histogram
ax_hist2.axvline(
data[feature].mean(), color="green", linestyle="--"
) # Add mean to the histogram
ax_hist2.axvline(
data[feature].median(), color="black", linestyle="-"
) # Add median to the histogram
histogram_boxplot(df,'normalized_used_price')
Normal used price distribution looks roughly normal with some outliers on both sides.
histogram_boxplot(df,'normalized_new_price')
Normalized new price distribution also looks roughly normal with outliers on both sides.
histogram_boxplot(df,'weight')
Weight distribution looks roughly normal, but is highly skewed to the right.
histogram_boxplot(df,'ram')
Ram does not have a normal distribution.
histogram_boxplot(df,'days_used')
Days used does not have a normal distribution, but it has no outliers.
histogram_boxplot(df,'release_year')
Release year is also not normally distributed, but with no outliers.
histogram_boxplot(df,'battery')
Battery is not normally distributed with a high right skew to the data.
histogram_boxplot(df,'int_memory')
Internal Memory is also not normally distributed with a heavy right skew. Due to the units it is skewed, it may benefit from a log fitting for the model.
histogram_boxplot(df,'selfie_camera_mp')
histogram_boxplot(df,'main_camera_mp')
Both Selfie and Main camera megapixels are not normally distributed and skewed to the right with outliers.
What percentage of the used device market is dominated by Android devices?
len(df[df['os'] == "Android"])/len(df['os'])*100
93.05153445280834
sns.countplot(data=df, x='os')
<AxesSubplot:xlabel='os', ylabel='count'>
93% of all devices sold are Android.
The amount of RAM is important for the smooth functioning of a device. How does the amount of RAM vary with the brand?
sns.set(font_scale = 2)
ax = sns.boxplot(x="brand_name", y="ram", data=data)
ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)
ax.set_xlabel("Brand Name", fontsize = 20)
ax.set_ylabel("RAM", fontsize = 20)
sns.set(rc = {'figure.figsize':(30,20)})
plt.show()
RAM offered ranges from 0.02 - 12 GB across brands. Several Brands offer up to 12 GB RAM (Huawei, Motorola, OnePlus, Oppo, Samsung, and Xiaomi) on set devices. Most brands offer 4 GB of Ram.
A large battery often increases a device's weight, making it feel uncomfortable in the hands. How does the weight vary for phones and tablets offering large batteries (more than 4500 mAh)?
batdf = df.loc[(df['battery'] > 4500)]
batdf.head()
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | Honor | Android | 25.50 | yes | yes | 13.0 | 8.0 | 64.0 | 6.0 | 7250.0 | 480.0 | 2020 | 345 | 5.135387 | 5.630961 |
| 4 | Honor | Android | 15.32 | yes | no | 13.0 | 8.0 | 64.0 | 3.0 | 5000.0 | 185.0 | 2020 | 293 | 4.389995 | 4.947837 |
| 11 | Honor | Android | 15.32 | yes | no | 13.0 | 8.0 | 64.0 | 4.0 | 5000.0 | 185.0 | 2020 | 344 | 4.657383 | 4.770176 |
| 19 | Honor | Android | 25.43 | yes | no | 5.0 | 2.0 | 32.0 | 3.0 | 5100.0 | 173.0 | 2019 | 266 | 4.267878 | 5.517051 |
| 20 | Honor | Android | 20.32 | yes | no | 8.0 | 8.0 | 32.0 | 3.0 | 5100.0 | 173.0 | 2019 | 321 | 4.556295 | 5.304001 |
sns.set(font_scale = 2)
ax = sns.scatterplot(x="weight", y="battery", data=batdf)
sns.set(rc = {'figure.figsize':(30,20)})
plt.show()
#pd.DataFrame(data=data,columns=['A','B'])
#M = df['B'].to_numpy()
Testdf = df.dropna()
x = Testdf['weight'].to_numpy()
y = Testdf['battery'].to_numpy()
np.corrcoef(x, y)
array([[1. , 0.71675693],
[0.71675693, 1. ]])
Weight is positively correlated with battery and has a 71% correlation. So generally as weight increases, so does the energy capacity of the battery.
Bigger screens are desirable for entertainment purposes as they offer a better viewing experience. How many phones and tablets are available across different brands with a screen size larger than 6 inches?
sns.set(font_scale = 2)
ax = sns.boxplot(x="brand_name", y="screen_size", data=data)
ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)
sns.set(rc = {'figure.figsize':(30,20)})
plt.show()
len((data['screen_size'] > 15.24))
3454
6 inches = 15.24 cm
A lot of devices nowadays offer great selfie cameras, allowing us to capture our favorite moments with loved ones. What is the distribution of devices offering greater than 8MP selfie cameras across brands?
selfiedf = df.loc[(df['selfie_camera_mp'] > 8)]
selfiedf.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| screen_size | 655.0 | 15.376595 | 1.461576 | 5.280000 | 15.270000 | 15.370000 | 16.280000 | 20.350000 |
| main_camera_mp | 535.0 | 12.138692 | 4.449784 | 0.300000 | 10.500000 | 13.000000 | 13.000000 | 48.000000 |
| selfie_camera_mp | 655.0 | 18.750840 | 6.324722 | 9.000000 | 16.000000 | 16.000000 | 20.000000 | 32.000000 |
| int_memory | 655.0 | 109.874809 | 82.725308 | 8.000000 | 64.000000 | 128.000000 | 128.000000 | 1024.000000 |
| ram | 655.0 | 5.273282 | 1.977736 | 1.000000 | 4.000000 | 4.000000 | 6.000000 | 12.000000 |
| battery | 655.0 | 3885.786260 | 643.128130 | 1470.000000 | 3400.000000 | 4000.000000 | 4200.000000 | 6000.000000 |
| weight | 655.0 | 182.504656 | 22.950825 | 127.000000 | 167.200000 | 181.000000 | 196.500000 | 300.000000 |
| release_year | 655.0 | 2018.619847 | 1.240463 | 2013.000000 | 2018.000000 | 2019.000000 | 2019.000000 | 2020.000000 |
| days_used | 655.0 | 425.967939 | 206.892969 | 91.000000 | 264.500000 | 386.000000 | 557.500000 | 1091.000000 |
| normalized_used_price | 655.0 | 4.963784 | 0.345042 | 3.758406 | 4.734091 | 4.932169 | 5.164982 | 6.619433 |
| normalized_new_price | 655.0 | 5.792289 | 0.556782 | 4.602166 | 5.394399 | 5.703816 | 6.152711 | 7.847841 |
sns.set(font_scale = 2)
ax = sns.boxplot(x="brand_name", y="selfie_camera_mp", data=selfiedf)
ax.set_xticklabels(ax.get_xticklabels(),rotation = 90)
sns.set(rc = {'figure.figsize':(30,20)})
plt.show()
Above is the boxplot and summary statistics for brand name devices with selfie cameras larger than 8 MP to better understand the distributions of devices with greater than 8 MP selfie cameras.
Which attributes are highly correlated with the normalized price of a used device?
df.corr()
| screen_size | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| screen_size | 1.000000 | 0.150316 | 0.271640 | 0.071291 | 0.274449 | 0.813533 | 0.828890 | 0.364223 | -0.291723 | 0.614785 | 0.460889 |
| main_camera_mp | 0.150316 | 1.000000 | 0.429264 | 0.018766 | 0.260802 | 0.248563 | -0.087738 | 0.353728 | -0.144672 | 0.587302 | 0.541655 |
| selfie_camera_mp | 0.271640 | 0.429264 | 1.000000 | 0.296426 | 0.477411 | 0.369709 | -0.004997 | 0.690942 | -0.552636 | 0.608074 | 0.475479 |
| int_memory | 0.071291 | 0.018766 | 0.296426 | 1.000000 | 0.122496 | 0.117736 | 0.014948 | 0.235429 | -0.242712 | 0.190993 | 0.195992 |
| ram | 0.274449 | 0.260802 | 0.477411 | 0.122496 | 1.000000 | 0.280740 | 0.089916 | 0.314203 | -0.280066 | 0.520289 | 0.532056 |
| battery | 0.813533 | 0.248563 | 0.369709 | 0.117736 | 0.280740 | 1.000000 | 0.703388 | 0.488660 | -0.370895 | 0.613619 | 0.470754 |
| weight | 0.828890 | -0.087738 | -0.004997 | 0.014948 | 0.089916 | 0.703388 | 1.000000 | 0.071290 | -0.067470 | 0.382354 | 0.269540 |
| release_year | 0.364223 | 0.353728 | 0.690942 | 0.235429 | 0.314203 | 0.488660 | 0.071290 | 1.000000 | -0.750390 | 0.509790 | 0.303689 |
| days_used | -0.291723 | -0.144672 | -0.552636 | -0.242712 | -0.280066 | -0.370895 | -0.067470 | -0.750390 | 1.000000 | -0.358264 | -0.216625 |
| normalized_used_price | 0.614785 | 0.587302 | 0.608074 | 0.190993 | 0.520289 | 0.613619 | 0.382354 | 0.509790 | -0.358264 | 1.000000 | 0.834496 |
| normalized_new_price | 0.460889 | 0.541655 | 0.475479 | 0.195992 | 0.532056 | 0.470754 | 0.269540 | 0.303689 | -0.216625 | 0.834496 | 1.000000 |
numeric_columns = df.select_dtypes(include=np.number).columns.tolist()
numeric_columns.remove('release_year') # dropping year column as it is temporal variable
# correlation heatmap
sns.set(font_scale = 2)
plt.figure(figsize=(15, 7))
sns.heatmap(
df[numeric_columns].corr(),
annot=True,
vmin=-1,
vmax=1,
fmt=".2f",
cmap="Spectral",
)
plt.show()
Normalized new price correlates the highest with 83%. The next highest correlation attributes are in the 40-50% correlation, so not particularly high, but worth looking into (main_camera_mp (54%), ram (53%), battery (47%), screen_size (46%)).
df.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 179 selfie_camera_mp 2 int_memory 4 ram 4 battery 6 weight 7 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
Main Camera MP has the most missing values with 179 missing. Weight is next with 7 values missing, followed by battery with 6 missing values, internal memory and ram with 4 values each, finally selfie camera mp has 2 missing values. Let's explore the main camera mp in more detail
df[df.main_camera_mp.isnull()]
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 59 | Infinix | Android | 17.32 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 6000.0 | 209.0 | 2020 | 245 | 4.282068 | 4.597642 |
| 60 | Infinix | Android | 15.39 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 5000.0 | 185.0 | 2020 | 173 | 4.363608 | 4.711780 |
| 61 | Infinix | Android | 15.39 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 5000.0 | 185.0 | 2020 | 256 | 4.181439 | 4.505460 |
| 62 | Infinix | Android | 15.39 | yes | no | NaN | 16.0 | 32.0 | 3.0 | 4000.0 | 178.0 | 2019 | 316 | 4.555244 | 4.602166 |
| 63 | Infinix | Android | 15.29 | yes | no | NaN | 16.0 | 32.0 | 2.0 | 4000.0 | 165.0 | 2019 | 468 | 4.416670 | 4.871297 |
| 64 | Lava | Android | 14.50 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 3200.0 | 150.0 | 2020 | 356 | 3.989910 | 4.381026 |
| 97 | Meizu | Android | 15.39 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4500.0 | 199.0 | 2020 | 186 | 5.167867 | 6.272028 |
| 98 | Meizu | Android | 15.37 | yes | no | NaN | 16.0 | 128.0 | 6.0 | 4500.0 | 183.0 | 2019 | 449 | 4.665512 | 5.526329 |
| 99 | Meizu | Android | 15.37 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 4000.0 | 184.0 | 2019 | 190 | 4.290185 | 5.195177 |
| 100 | Meizu | Android | 15.29 | yes | no | NaN | 20.0 | 128.0 | 6.0 | 3600.0 | 166.0 | 2019 | 459 | 4.721085 | 5.830679 |
| 101 | Meizu | Android | 15.29 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 4000.0 | 165.0 | 2019 | 517 | 4.842296 | 5.388661 |
| 102 | Meizu | Android | 15.29 | yes | no | NaN | 20.0 | 128.0 | 6.0 | 3600.0 | 165.0 | 2019 | 332 | 4.959412 | 6.040659 |
| 108 | Motorola | Android | 15.37 | yes | no | NaN | 16.0 | 128.0 | 6.0 | 5000.0 | 210.0 | 2020 | 242 | 4.829673 | 5.704849 |
| 109 | Motorola | Android | 15.42 | yes | yes | NaN | 25.0 | 256.0 | 12.0 | 5000.0 | 203.0 | 2020 | 220 | 5.186883 | 7.090785 |
| 118 | OnePlus | Android | 16.64 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4300.0 | 180.0 | 2020 | 110 | 5.620944 | 6.609309 |
| 119 | OnePlus | Android | 16.64 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4300.0 | 180.0 | 2020 | 208 | 5.288166 | 6.474107 |
| 120 | OnePlus | Android | 16.94 | yes | yes | NaN | 16.0 | 256.0 | 12.0 | 4085.0 | 206.0 | 2019 | 458 | 5.713403 | 6.686161 |
| 121 | OnePlus | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4000.0 | 206.0 | 2019 | 294 | 5.670984 | 6.686647 |
| 122 | Oppo | Android | 16.64 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4000.0 | 172.0 | 2020 | 187 | 5.170882 | 6.153456 |
| 123 | Oppo | Android | 16.33 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4000.0 | 183.0 | 2020 | 271 | 5.009635 | 5.910254 |
| 124 | Oppo | Android | 16.69 | yes | yes | NaN | 32.0 | 128.0 | 6.0 | 4000.0 | 184.0 | 2020 | 100 | 5.033896 | 5.672842 |
| 125 | Oppo | Android | 16.64 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4000.0 | 186.0 | 2020 | 186 | 5.116616 | 6.253483 |
| 126 | Oppo | Android | 15.29 | yes | no | NaN | 8.0 | 64.0 | 3.0 | 4230.0 | 168.0 | 2020 | 273 | 4.578313 | 4.782479 |
| 127 | Oppo | Android | 15.34 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4025.0 | 180.0 | 2020 | 146 | 5.412315 | 5.988385 |
| 138 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4300.0 | 191.0 | 2020 | 289 | 4.702660 | 5.302060 |
| 139 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 5000.0 | 199.0 | 2020 | 338 | 4.941571 | 5.014162 |
| 140 | Realme | Android | 15.37 | yes | no | NaN | 5.0 | 32.0 | 3.0 | 5000.0 | 195.0 | 2020 | 111 | 4.498364 | 4.606569 |
| 141 | Realme | Android | 15.27 | yes | no | NaN | 5.0 | 32.0 | 3.0 | 4000.0 | 166.0 | 2020 | 167 | 2.896464 | 3.690877 |
| 142 | Realme | Android | 15.27 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 4000.0 | 166.0 | 2019 | 399 | 4.288677 | 4.611351 |
| 155 | Vivo | Android | 16.36 | yes | no | NaN | 32.0 | 128.0 | 8.0 | 4500.0 | 176.0 | 2020 | 343 | 5.052864 | 5.766695 |
| 156 | Vivo | Android | 16.59 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 5000.0 | 204.1 | 2020 | 246 | 4.801066 | 5.068904 |
| 157 | Vivo | Android | 16.66 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4315.0 | 181.5 | 2020 | 145 | 5.560335 | 6.446117 |
| 158 | Vivo | Android | 16.21 | yes | no | NaN | 16.0 | 128.0 | 8.0 | 4500.0 | 190.2 | 2020 | 300 | 4.847332 | 5.702782 |
| 159 | Vivo | Android | 16.43 | yes | no | NaN | 8.0 | 128.0 | 4.0 | 5000.0 | 197.0 | 2020 | 118 | 4.823261 | 5.249652 |
| 160 | Vivo | Android | 16.69 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4500.0 | 198.1 | 2020 | 312 | 4.954347 | 5.857447 |
| 161 | Vivo | Android | 17.50 | yes | yes | NaN | 16.0 | 256.0 | 8.0 | 4500.0 | 219.5 | 2020 | 226 | 5.466878 | 6.447290 |
| 162 | Vivo | Android | 16.36 | yes | no | NaN | 32.0 | 128.0 | 8.0 | 4500.0 | 176.0 | 2020 | 199 | 5.192623 | 5.599199 |
| 190 | Xiaomi | Android | 16.69 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4520.0 | 208.0 | 2020 | 318 | 4.852499 | 5.664244 |
| 191 | Xiaomi | Android | 16.69 | yes | yes | NaN | 16.0 | 64.0 | 6.0 | 4520.0 | 205.0 | 2020 | 301 | 4.977492 | 5.301313 |
| 192 | Xiaomi | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4800.0 | 208.0 | 2020 | 319 | 5.120924 | 5.523140 |
| 193 | Xiaomi | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4500.0 | 208.0 | 2020 | 184 | 5.026246 | 5.563792 |
| 194 | Xiaomi | Android | 16.94 | yes | no | NaN | 32.0 | 64.0 | 6.0 | 5020.0 | 209.0 | 2020 | 256 | 4.966753 | 5.198718 |
| 195 | Xiaomi | Android | 15.80 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 5000.0 | 188.0 | 2020 | 265 | 4.344065 | 4.495355 |
| 196 | Xiaomi | Android | 15.80 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 5000.0 | 188.0 | 2020 | 321 | 3.421000 | 4.392224 |
| 197 | Xiaomi | Android | 16.94 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 4500.0 | 208.0 | 2020 | 169 | 4.967310 | 5.295513 |
| 204 | ZTE | Android | 16.89 | yes | yes | NaN | 12.0 | 256.0 | 8.0 | 5100.0 | 215.0 | 2020 | 235 | 5.390851 | 6.394694 |
| 205 | ZTE | Android | 16.89 | yes | yes | NaN | 12.0 | 128.0 | 6.0 | 5100.0 | 210.0 | 2020 | 278 | 4.652054 | 5.740147 |
| 278 | Infinix | Android | 17.32 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 6000.0 | 209.0 | 2020 | 320 | 4.405133 | 4.605370 |
| 279 | Infinix | Android | 15.39 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 5000.0 | 185.0 | 2020 | 173 | 4.495913 | 4.702115 |
| 280 | Infinix | Android | 15.39 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 5000.0 | 185.0 | 2020 | 329 | 4.370713 | 4.487287 |
| 281 | Infinix | Android | 15.39 | yes | no | NaN | 16.0 | 32.0 | 3.0 | 4000.0 | 178.0 | 2019 | 356 | 4.417997 | 4.605970 |
| 282 | Infinix | Android | 15.29 | yes | no | NaN | 16.0 | 32.0 | 2.0 | 4000.0 | 165.0 | 2019 | 497 | 4.423289 | 4.866072 |
| 283 | Lava | Android | 14.50 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 3200.0 | 150.0 | 2020 | 359 | 3.918204 | 4.381276 |
| 316 | Meizu | Android | 15.39 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4500.0 | 199.0 | 2020 | 318 | 5.153581 | 6.271536 |
| 317 | Meizu | Android | 15.37 | yes | no | NaN | 16.0 | 128.0 | 6.0 | 4500.0 | 183.0 | 2019 | 348 | 4.749271 | 5.526210 |
| 318 | Meizu | Android | 15.37 | yes | no | NaN | 8.0 | 32.0 | 2.0 | 4000.0 | 184.0 | 2019 | 198 | 4.319353 | 5.191623 |
| 319 | Meizu | Android | 15.29 | yes | no | NaN | 20.0 | 128.0 | 6.0 | 3600.0 | 166.0 | 2019 | 405 | 4.874892 | 5.832352 |
| 320 | Meizu | Android | 15.29 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 4000.0 | 165.0 | 2019 | 499 | 4.805577 | 5.395626 |
| 321 | Meizu | Android | 15.29 | yes | no | NaN | 20.0 | 128.0 | 6.0 | 3600.0 | 165.0 | 2019 | 226 | 5.218841 | 6.042918 |
| 327 | Motorola | Android | 15.37 | yes | no | NaN | 16.0 | 128.0 | 6.0 | 5000.0 | 210.0 | 2020 | 92 | 5.034156 | 5.697295 |
| 328 | Motorola | Android | 15.42 | yes | yes | NaN | 25.0 | 256.0 | 12.0 | 5000.0 | 203.0 | 2020 | 188 | 5.639813 | 7.087983 |
| 337 | OnePlus | Android | 16.64 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4300.0 | 180.0 | 2020 | 186 | 5.482928 | 6.605054 |
| 338 | OnePlus | Android | 16.64 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4300.0 | 180.0 | 2020 | 232 | 5.235165 | 6.478525 |
| 339 | OnePlus | Android | 16.94 | yes | yes | NaN | 16.0 | 256.0 | 12.0 | 4085.0 | 206.0 | 2019 | 280 | 5.079850 | 6.685761 |
| 340 | OnePlus | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4000.0 | 206.0 | 2019 | 523 | 5.342526 | 6.684362 |
| 341 | Oppo | Android | 16.64 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4000.0 | 172.0 | 2020 | 293 | 5.269969 | 6.154455 |
| 342 | Oppo | Android | 16.33 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4000.0 | 183.0 | 2020 | 281 | 5.047867 | 5.911609 |
| 343 | Oppo | Android | 16.69 | yes | yes | NaN | 32.0 | 128.0 | 6.0 | 4000.0 | 184.0 | 2020 | 122 | 5.126045 | 5.668017 |
| 344 | Oppo | Android | 16.64 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4000.0 | 186.0 | 2020 | 200 | 5.094302 | 6.254136 |
| 345 | Oppo | Android | 15.29 | yes | no | NaN | 8.0 | 64.0 | 3.0 | 4230.0 | 168.0 | 2020 | 267 | 4.515027 | 4.785907 |
| 346 | Oppo | Android | 15.34 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4025.0 | 180.0 | 2020 | 218 | 5.307178 | 5.990339 |
| 357 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4300.0 | 191.0 | 2020 | 97 | 4.798514 | 5.299916 |
| 358 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 5000.0 | 199.0 | 2020 | 362 | 4.932169 | 5.010769 |
| 359 | Realme | Android | 15.37 | yes | no | NaN | 5.0 | 32.0 | 3.0 | 5000.0 | 195.0 | 2020 | 321 | 4.383151 | 4.600459 |
| 360 | Realme | Android | 15.27 | yes | no | NaN | 5.0 | 32.0 | 3.0 | 4000.0 | 166.0 | 2020 | 247 | 2.667228 | 3.709907 |
| 361 | Realme | Android | 15.27 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 4000.0 | 166.0 | 2019 | 450 | 4.335721 | 4.601062 |
| 401 | Coolpad | Android | 16.59 | yes | yes | NaN | 16.0 | 64.0 | 4.0 | 4000.0 | 195.0 | 2020 | 252 | 5.174227 | 5.885465 |
| 402 | Coolpad | Android | 16.15 | yes | no | NaN | 13.0 | 32.0 | 3.0 | 4000.0 | 170.0 | 2019 | 342 | 4.588329 | 5.085186 |
| 697 | Asus | Android | 15.90 | yes | no | NaN | 8.0 | 32.0 | 4.0 | 4000.0 | 165.0 | 2019 | 524 | 4.727919 | 5.701346 |
| 698 | Asus | Android | 15.90 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 4000.0 | 165.0 | 2019 | 397 | 4.462800 | 5.738281 |
| 818 | BlackBerry | Android | 10.29 | yes | no | NaN | 8.0 | 32.0 | 4.0 | 3000.0 | 156.0 | 2018 | 552 | 4.061305 | 5.802300 |
| 819 | BlackBerry | Android | 15.21 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4000.0 | 170.0 | 2018 | 629 | 4.693913 | 5.853035 |
| 820 | BlackBerry | Android | 15.21 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4000.0 | 170.0 | 2018 | 383 | 4.946275 | 5.709301 |
| 821 | BlackBerry | Android | 10.29 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 3500.0 | 168.0 | 2018 | 471 | 4.107919 | 6.217444 |
| 972 | Coolpad | Android | 15.29 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 4000.0 | 177.0 | 2018 | 655 | 4.634146 | 5.007096 |
| 1786 | Meizu | Android | 15.29 | yes | no | NaN | 20.0 | 64.0 | 4.0 | 4000.0 | 170.0 | 2019 | 295 | 4.854683 | 5.296115 |
| 2111 | OnePlus | Android | 16.28 | yes | no | NaN | 16.0 | 256.0 | 4.0 | 3700.0 | 185.0 | 2018 | 489 | 5.065755 | 6.552422 |
| 2112 | OnePlus | Android | 16.28 | yes | no | NaN | 16.0 | 256.0 | 4.0 | 3700.0 | 185.0 | 2018 | 496 | 5.000046 | 6.309445 |
| 2113 | OnePlus | Android | 15.95 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 3300.0 | 177.0 | 2018 | 386 | 5.380404 | 6.252636 |
| 2114 | OnePlus | Android | 15.27 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 3300.0 | 162.0 | 2017 | 623 | 5.194733 | 6.214968 |
| 2115 | OnePlus | Android | 12.83 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 3300.0 | 153.0 | 2017 | 875 | 4.735584 | 6.213027 |
| 2201 | Panasonic | Android | 15.72 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 4000.0 | 169.0 | 2018 | 487 | 4.268998 | 5.293707 |
| 2202 | Panasonic | Android | 15.70 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 3000.0 | 195.0 | 2018 | 717 | 4.873058 | 5.856217 |
| 2670 | Sony | Android | 15.37 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 3330.0 | 180.0 | 2019 | 328 | 5.468440 | 6.907495 |
| 2671 | Sony | Android | 15.37 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 3000.0 | 180.0 | 2019 | 252 | 4.772801 | 6.061690 |
| 2672 | Sony | Android | 15.24 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 2870.0 | 162.0 | 2019 | 235 | 4.914271 | 5.862579 |
| 2673 | Sony | Android | 12.88 | yes | no | NaN | 8.0 | 32.0 | 4.0 | 3300.0 | 156.0 | 2019 | 195 | 4.319885 | 5.289024 |
| 3223 | Meizu | Android | 15.39 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4500.0 | 219.0 | 2020 | 261 | 4.822135 | 6.397080 |
| 3224 | Motorola | Android | 15.34 | yes | no | NaN | 8.0 | 32.0 | 3.0 | 4000.0 | 189.4 | 2020 | 349 | 4.349245 | 5.135740 |
| 3225 | Motorola | Android | 15.29 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 3550.0 | 185.0 | 2020 | 174 | 4.511189 | 4.848038 |
| 3226 | Motorola | Android | 15.37 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 5000.0 | 200.0 | 2020 | 352 | 4.454580 | 4.807294 |
| 3227 | Motorola | Android | 15.27 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 3000.0 | 160.0 | 2020 | 208 | 4.201404 | 4.410614 |
| 3228 | Motorola | Android | 15.34 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 4000.0 | 192.0 | 2020 | 186 | 4.670490 | 5.110481 |
| 3229 | Motorola | Android | 15.34 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 5000.0 | 199.0 | 2020 | 97 | 4.652531 | 5.358895 |
| 3230 | Motorola | Android | 15.34 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 5000.0 | 197.0 | 2020 | 202 | 4.718499 | 4.996874 |
| 3248 | OnePlus | Android | 16.94 | yes | no | NaN | 16.0 | 256.0 | 8.0 | 4085.0 | 206.0 | 2019 | 341 | 5.533073 | 6.312841 |
| 3249 | OnePlus | Android | 16.64 | yes | no | NaN | 16.0 | 128.0 | 8.0 | 3800.0 | 190.0 | 2019 | 381 | 5.137327 | 5.946703 |
| 3250 | Oppo | Android | 15.37 | yes | yes | NaN | 32.0 | 256.0 | 12.0 | 4025.0 | 171.0 | 2020 | 227 | 5.447728 | 6.232743 |
| 3251 | Oppo | Android | 15.34 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4025.0 | 180.0 | 2020 | 306 | 4.998090 | 5.826442 |
| 3252 | Oppo | Android | 15.42 | yes | yes | NaN | 32.0 | 256.0 | 12.0 | 4260.0 | 217.0 | 2020 | 193 | 5.742009 | 6.840536 |
| 3253 | Oppo | Android | 15.42 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4200.0 | 209.0 | 2020 | 91 | 5.112771 | 6.714765 |
| 3260 | Realme | Android | 16.36 | yes | yes | NaN | 8.0 | 128.0 | 6.0 | 4200.0 | 209.0 | 2020 | 148 | 4.755313 | 5.979493 |
| 3261 | Realme | Android | 16.69 | yes | yes | NaN | 8.0 | 128.0 | 6.0 | 4200.0 | 194.0 | 2020 | 232 | 4.822859 | 5.458265 |
| 3262 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 64.0 | 3.0 | 5000.0 | 199.0 | 2020 | 320 | 4.664099 | 4.938208 |
| 3263 | Realme | Android | 15.39 | yes | no | NaN | 8.0 | 64.0 | 6.0 | 4300.0 | 202.0 | 2020 | 347 | 4.418600 | 5.354934 |
| 3264 | Realme | Android | 16.36 | yes | yes | NaN | 8.0 | 128.0 | 6.0 | 4200.0 | 205.0 | 2020 | 257 | 5.251226 | 6.175327 |
| 3265 | Realme | Android | 15.37 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 5000.0 | 195.0 | 2020 | 357 | 4.189503 | 4.665324 |
| 3266 | Realme | Android | 16.69 | yes | yes | NaN | 8.0 | 64.0 | 6.0 | 4200.0 | 202.0 | 2020 | 201 | 4.694371 | 5.318071 |
| 3267 | Realme | Android | 16.56 | yes | no | NaN | 8.0 | 32.0 | 3.0 | 5000.0 | 195.0 | 2020 | 278 | 4.489872 | 4.779039 |
| 3268 | Realme | Android | 15.37 | yes | no | NaN | 13.0 | 64.0 | 4.0 | 5000.0 | 198.0 | 2019 | 299 | 4.700844 | 4.967380 |
| 3269 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 4000.0 | 199.0 | 2019 | 475 | 5.069156 | 5.901047 |
| 3270 | Realme | Android | 15.34 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4000.0 | 183.0 | 2019 | 264 | 4.624482 | 5.159974 |
| 3271 | Realme | Android | 15.32 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4035.0 | 184.0 | 2019 | 538 | 4.515574 | 5.074424 |
| 3272 | Realme | Android | 15.32 | yes | no | NaN | 25.0 | 64.0 | 4.0 | 4045.0 | 172.0 | 2019 | 277 | 4.882196 | 4.966753 |
| 3273 | Sony | Android | 15.37 | yes | yes | NaN | 8.0 | 256.0 | 8.0 | 4000.0 | 181.4 | 2020 | 236 | 5.483427 | 6.839637 |
| 3274 | Sony | Android | 15.24 | yes | no | NaN | 8.0 | 128.0 | 4.0 | 3600.0 | 151.0 | 2020 | 349 | 4.937132 | 5.602672 |
| 3275 | Sony | Android | 15.27 | yes | no | NaN | 8.0 | 128.0 | 6.0 | 3140.0 | 164.0 | 2019 | 348 | 5.124381 | 5.973020 |
| 3280 | Vivo | Android | 16.66 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4315.0 | 181.5 | 2020 | 165 | 5.305591 | 6.301831 |
| 3281 | Vivo | Android | 16.59 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4500.0 | 190.0 | 2020 | 197 | 4.812835 | 5.507281 |
| 3282 | Vivo | Android | 16.69 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4500.0 | 195.0 | 2020 | 352 | 5.147087 | 5.680957 |
| 3283 | Vivo | Android | 16.59 | yes | no | NaN | 16.0 | 128.0 | 8.0 | 5000.0 | 197.0 | 2020 | 108 | 5.061075 | 5.358848 |
| 3284 | Vivo | Android | 16.69 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 5000.0 | 201.0 | 2020 | 135 | 4.938280 | 5.532876 |
| 3285 | Vivo | Android | 16.36 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4400.0 | 214.5 | 2020 | 154 | 4.791816 | 6.191667 |
| 3293 | Xiaomi | Android | 16.59 | yes | no | NaN | 8.0 | 32.0 | 3.0 | 5020.0 | 198.0 | 2020 | 340 | 4.475062 | 4.705016 |
| 3294 | Xiaomi | Android | 16.59 | yes | no | NaN | 13.0 | 128.0 | 4.0 | 5020.0 | 199.0 | 2020 | 354 | 4.944068 | 5.135740 |
| 3295 | Xiaomi | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4700.0 | 219.0 | 2020 | 128 | 4.774153 | 5.882486 |
| 3296 | Xiaomi | Android | 16.94 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 5020.0 | 209.0 | 2020 | 284 | 4.870453 | 5.262431 |
| 3297 | Xiaomi | Android | 16.59 | yes | no | NaN | 13.0 | 64.0 | 3.0 | 5020.0 | 199.0 | 2020 | 208 | 4.497028 | 4.912655 |
| 3298 | Xiaomi | Android | 16.43 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 5260.0 | 204.0 | 2020 | 200 | 4.781138 | 5.420933 |
| 3299 | Xiaomi | Android | 16.69 | yes | yes | NaN | 16.0 | 64.0 | 6.0 | 4160.0 | 192.0 | 2020 | 156 | 5.063164 | 5.692519 |
| 3300 | Xiaomi | Android | 16.69 | yes | yes | NaN | 16.0 | 64.0 | 6.0 | 4160.0 | 192.0 | 2020 | 351 | 4.829992 | 5.508092 |
| 3301 | Xiaomi | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 8.0 | 4700.0 | 218.0 | 2020 | 194 | 5.500523 | 6.250921 |
| 3302 | Xiaomi | Android | 16.94 | yes | yes | NaN | 16.0 | 128.0 | 6.0 | 4700.0 | 218.0 | 2020 | 306 | 5.136151 | 6.023665 |
| 3303 | Xiaomi | Android | 16.94 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 5020.0 | 209.0 | 2020 | 96 | 4.686934 | 5.059235 |
| 3304 | Xiaomi | Android | 17.81 | yes | yes | NaN | 20.0 | 256.0 | 8.0 | 5000.0 | 253.0 | 2020 | 361 | 5.223917 | 6.509499 |
| 3305 | Xiaomi | Android | 16.94 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4720.0 | 222.0 | 2020 | 164 | 5.119191 | 6.215907 |
| 3306 | Xiaomi | Android | 16.94 | yes | yes | NaN | 20.0 | 256.0 | 8.0 | 4500.0 | 208.0 | 2020 | 112 | 5.341329 | 6.570869 |
| 3307 | Xiaomi | Android | 16.94 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4780.0 | 208.0 | 2020 | 321 | 4.865610 | 6.158228 |
| 3326 | ZTE | Android | 16.43 | yes | yes | NaN | 20.0 | 128.0 | 6.0 | 4000.0 | 168.0 | 2020 | 217 | 5.371242 | 6.232743 |
| 3327 | ZTE | Android | 16.89 | yes | yes | NaN | 8.0 | 128.0 | 8.0 | 4500.0 | 218.0 | 2020 | 133 | 5.103883 | 6.198784 |
| 3364 | Meizu | Android | 15.39 | yes | yes | NaN | 20.0 | 128.0 | 8.0 | 4500.0 | 219.0 | 2020 | 139 | 5.569260 | 6.397080 |
| 3365 | Motorola | Android | 15.34 | yes | no | NaN | 8.0 | 32.0 | 3.0 | 4000.0 | 189.4 | 2020 | 101 | 4.579032 | 5.135740 |
| 3366 | Motorola | Android | 15.29 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 3550.0 | 185.0 | 2020 | 230 | 4.290185 | 4.848038 |
| 3367 | Motorola | Android | 15.37 | yes | no | NaN | 8.0 | 64.0 | 4.0 | 5000.0 | 200.0 | 2020 | 277 | 4.464528 | 4.807294 |
| 3368 | Motorola | Android | 15.27 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 3000.0 | 160.0 | 2020 | 249 | 4.149779 | 4.410614 |
| 3369 | Motorola | Android | 15.34 | yes | no | NaN | 16.0 | 128.0 | 4.0 | 4000.0 | 192.0 | 2020 | 176 | 4.876494 | 5.110481 |
| 3370 | Motorola | Android | 15.34 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 5000.0 | 199.0 | 2020 | 156 | 4.650717 | 5.358895 |
| 3371 | Motorola | Android | 15.34 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 5000.0 | 197.0 | 2020 | 364 | 4.743279 | 4.996874 |
| 3389 | OnePlus | Android | 16.94 | yes | no | NaN | 16.0 | 256.0 | 8.0 | 4085.0 | 206.0 | 2019 | 367 | 5.415345 | 6.312841 |
| 3390 | OnePlus | Android | 16.64 | yes | no | NaN | 16.0 | 128.0 | 8.0 | 3800.0 | 190.0 | 2019 | 476 | 4.947127 | 5.946703 |
| 3391 | Oppo | Android | 15.37 | yes | yes | NaN | 32.0 | 256.0 | 12.0 | 4025.0 | 171.0 | 2020 | 348 | 5.630315 | 6.232743 |
| 3392 | Oppo | Android | 15.34 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4025.0 | 180.0 | 2020 | 326 | 5.364292 | 5.826442 |
| 3393 | Oppo | Android | 15.42 | yes | yes | NaN | 32.0 | 256.0 | 12.0 | 4260.0 | 217.0 | 2020 | 159 | 5.505982 | 6.840536 |
| 3394 | Oppo | Android | 15.42 | yes | yes | NaN | 32.0 | 128.0 | 8.0 | 4200.0 | 209.0 | 2020 | 261 | 5.792038 | 6.714765 |
| 3401 | Realme | Android | 16.36 | yes | yes | NaN | 8.0 | 128.0 | 6.0 | 4200.0 | 209.0 | 2020 | 219 | 4.657668 | 5.979493 |
| 3402 | Realme | Android | 16.69 | yes | yes | NaN | 8.0 | 128.0 | 6.0 | 4200.0 | 194.0 | 2020 | 258 | 5.031287 | 5.458265 |
| 3403 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 64.0 | 3.0 | 5000.0 | 199.0 | 2020 | 206 | 4.633174 | 4.938208 |
| 3404 | Realme | Android | 15.39 | yes | no | NaN | 8.0 | 64.0 | 6.0 | 4300.0 | 202.0 | 2020 | 231 | 4.800819 | 5.354934 |
| 3405 | Realme | Android | 16.36 | yes | yes | NaN | 8.0 | 128.0 | 6.0 | 4200.0 | 205.0 | 2020 | 141 | 5.259992 | 6.175327 |
| 3406 | Realme | Android | 15.37 | yes | no | NaN | 5.0 | 32.0 | 2.0 | 5000.0 | 195.0 | 2020 | 279 | 4.465218 | 4.665324 |
| 3407 | Realme | Android | 16.69 | yes | yes | NaN | 8.0 | 64.0 | 6.0 | 4200.0 | 202.0 | 2020 | 282 | 4.912802 | 5.318071 |
| 3408 | Realme | Android | 16.56 | yes | no | NaN | 8.0 | 32.0 | 3.0 | 5000.0 | 195.0 | 2020 | 113 | 4.377265 | 4.779039 |
| 3409 | Realme | Android | 15.37 | yes | no | NaN | 13.0 | 64.0 | 4.0 | 5000.0 | 198.0 | 2019 | 293 | 4.487737 | 4.967380 |
| 3410 | Realme | Android | 15.37 | yes | no | NaN | 16.0 | 64.0 | 6.0 | 4000.0 | 199.0 | 2019 | 181 | 4.895598 | 5.901047 |
| 3411 | Realme | Android | 15.34 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4000.0 | 183.0 | 2019 | 503 | 4.570475 | 5.159974 |
| 3412 | Realme | Android | 15.32 | yes | no | NaN | 16.0 | 64.0 | 4.0 | 4035.0 | 184.0 | 2019 | 433 | 4.521027 | 5.074424 |
| 3413 | Realme | Android | 15.32 | yes | no | NaN | 25.0 | 64.0 | 4.0 | 4045.0 | 172.0 | 2019 | 288 | 4.778199 | 4.966753 |
| 3448 | Asus | Android | 16.74 | yes | no | NaN | 24.0 | 128.0 | 8.0 | 6000.0 | 240.0 | 2019 | 325 | 5.715316 | 7.059317 |
| 3449 | Asus | Android | 15.34 | yes | no | NaN | 8.0 | 64.0 | 6.0 | 5000.0 | 190.0 | 2019 | 232 | 4.492337 | 6.483872 |
histogram_boxplot(df,'main_camera_mp')
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| screen_size | 3454.0 | 13.713115 | 3.805280 | 5.080000 | 12.700000 | 12.830000 | 15.340000 | 30.710000 |
| main_camera_mp | 3275.0 | 9.460208 | 4.815461 | 0.080000 | 5.000000 | 8.000000 | 13.000000 | 48.000000 |
| selfie_camera_mp | 3452.0 | 6.554229 | 6.970372 | 0.000000 | 2.000000 | 5.000000 | 8.000000 | 32.000000 |
| int_memory | 3450.0 | 54.573099 | 84.972371 | 0.010000 | 16.000000 | 32.000000 | 64.000000 | 1024.000000 |
| ram | 3450.0 | 4.036122 | 1.365105 | 0.020000 | 4.000000 | 4.000000 | 4.000000 | 12.000000 |
| battery | 3448.0 | 3133.402697 | 1299.682844 | 500.000000 | 2100.000000 | 3000.000000 | 4000.000000 | 9720.000000 |
| weight | 3447.0 | 182.751871 | 88.413228 | 69.000000 | 142.000000 | 160.000000 | 185.000000 | 855.000000 |
| release_year | 3454.0 | 2015.965258 | 2.298455 | 2013.000000 | 2014.000000 | 2015.500000 | 2018.000000 | 2020.000000 |
| days_used | 3454.0 | 674.869716 | 248.580166 | 91.000000 | 533.500000 | 690.500000 | 868.750000 | 1094.000000 |
| normalized_used_price | 3454.0 | 4.364712 | 0.588914 | 1.536867 | 4.033931 | 4.405133 | 4.755700 | 6.619433 |
| normalized_new_price | 3454.0 | 5.233107 | 0.683637 | 2.901422 | 4.790342 | 5.245892 | 5.673718 | 7.847841 |
sns.set(font_scale = 2)
plt.scatter(x='main_camera_mp', y='selfie_camera_mp', data=df)
<matplotlib.collections.PathCollection at 0x1e3bc55eeb0>
Checking to see if there is any correlation between the selfie camera MP and the main camera MP.
Testdf = df.dropna()
x = Testdf['main_camera_mp'].to_numpy()
y = Testdf['selfie_camera_mp'].to_numpy()
np.corrcoef(x, y)
array([[1. , 0.4284045],
[0.4284045, 1. ]])
42% correlation is quite low, so not best to correlate the main camera and selfie camera MPs.
The median value of 8 MP seems like a good fit for the missing values. The mean value is 9.46 which does not make sense as a MP value for a main camera. Therefore we will fill all missing values with the median of 8.
Updatedf = df.copy()
Updatedf["main_camera_mp"].fillna(8, inplace=True)
# checking missing values in rest of the data
Updatedf.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 2 int_memory 4 ram 4 battery 6 weight 7 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
df.groupby("brand_name")["selfie_camera_mp"].median()
brand_name Acer 2.0 Alcatel 1.3 Apple 7.0 Asus 5.0 BlackBerry 2.0 Celkon 0.3 Coolpad 8.0 Gionee 5.0 Google 8.0 HTC 5.0 Honor 8.0 Huawei 8.0 Infinix 8.0 Karbonn 0.3 LG 5.0 Lava 2.0 Lenovo 5.0 Meizu 8.0 Micromax 2.0 Microsoft 5.0 Motorola 5.0 Nokia 1.6 OnePlus 16.0 Oppo 16.0 Others 2.0 Panasonic 5.0 Realme 8.0 Samsung 5.0 Sony 5.0 Spice 1.3 Vivo 16.0 XOLO 1.3 Xiaomi 8.0 ZTE 5.0 Name: selfie_camera_mp, dtype: float64
df[df.selfie_camera_mp.isnull()]
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1080 | Android | 15.32 | yes | no | 12.2 | NaN | 64.0 | 4.0 | 3430.0 | 184.0 | 2018 | 475 | 5.573826 | 6.865974 | |
| 1081 | Android | 12.83 | yes | no | 12.2 | NaN | 64.0 | 4.0 | 2915.0 | 148.0 | 2018 | 424 | 4.464988 | 6.745083 |
Median selfie cameras for the Google brand name is 8 MP. Since both missing values are Google brand, we will replace those missing values with 8 MP.
Updatedf["selfie_camera_mp"].fillna(8, inplace=True)
# checking missing values in rest of the data
Updatedf.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 0 int_memory 4 ram 4 battery 6 weight 7 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
df[df.weight.isnull()]
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3002 | XOLO | Android | 12.70 | yes | no | 13.0 | 5.0 | 32.0 | 4.0 | 2400.0 | NaN | 2015 | 576 | 4.165890 | 4.930437 |
| 3003 | XOLO | Android | 12.83 | yes | no | 8.0 | 5.0 | 16.0 | 4.0 | 3200.0 | NaN | 2015 | 800 | 4.282068 | 5.189228 |
| 3004 | XOLO | Android | 12.70 | no | no | 8.0 | 2.0 | 32.0 | 4.0 | 2100.0 | NaN | 2015 | 878 | 3.879706 | 4.081090 |
| 3005 | XOLO | Android | 10.29 | no | no | 5.0 | 0.3 | 32.0 | 4.0 | 1800.0 | NaN | 2015 | 1036 | 3.823847 | 4.396053 |
| 3006 | XOLO | Android | 12.70 | no | no | 5.0 | 0.3 | 16.0 | 4.0 | 2500.0 | NaN | 2015 | 679 | 3.837084 | 4.347176 |
| 3007 | XOLO | Windows | 12.70 | no | no | 8.0 | 2.0 | 32.0 | 4.0 | 2200.0 | NaN | 2015 | 838 | 3.707210 | 4.791733 |
| 3008 | XOLO | Android | 12.70 | no | no | 8.0 | 5.0 | 32.0 | 4.0 | 2500.0 | NaN | 2015 | 1045 | 4.184642 | 4.785406 |
df.groupby("brand_name")["weight"].median()
brand_name Acer 155.00 Alcatel 149.00 Apple 299.00 Asus 170.00 BlackBerry 167.50 Celkon 140.00 Coolpad 155.00 Gionee 151.50 Google 167.00 HTC 155.00 Honor 171.75 Huawei 175.00 Infinix 185.00 Karbonn 126.50 LG 152.00 Lava 137.00 Lenovo 165.40 Meizu 159.50 Micromax 143.00 Microsoft 146.00 Motorola 166.00 Nokia 145.00 OnePlus 180.00 Oppo 169.50 Others 154.50 Panasonic 146.70 Realme 195.00 Samsung 165.00 Sony 154.50 Spice 158.00 Vivo 167.80 XOLO 126.00 Xiaomi 180.00 ZTE 157.50 Name: weight, dtype: float64
branddf = df.loc[(df['brand_name'] == "XOLO")]
branddf.head()
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3002 | XOLO | Android | 12.70 | yes | no | 13.0 | 5.0 | 32.0 | 4.0 | 2400.0 | NaN | 2015 | 576 | 4.165890 | 4.930437 |
| 3003 | XOLO | Android | 12.83 | yes | no | 8.0 | 5.0 | 16.0 | 4.0 | 3200.0 | NaN | 2015 | 800 | 4.282068 | 5.189228 |
| 3004 | XOLO | Android | 12.70 | no | no | 8.0 | 2.0 | 32.0 | 4.0 | 2100.0 | NaN | 2015 | 878 | 3.879706 | 4.081090 |
| 3005 | XOLO | Android | 10.29 | no | no | 5.0 | 0.3 | 32.0 | 4.0 | 1800.0 | NaN | 2015 | 1036 | 3.823847 | 4.396053 |
| 3006 | XOLO | Android | 12.70 | no | no | 5.0 | 0.3 | 16.0 | 4.0 | 2500.0 | NaN | 2015 | 679 | 3.837084 | 4.347176 |
histogram_boxplot(branddf,'weight')
The mean weight for the XOLO brand is right skewed, so replacing missing values with the median weight (126) makes more sense.
Updatedf["weight"].fillna(126, inplace=True)
# checking missing values in rest of the data
Updatedf.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 0 int_memory 4 ram 4 battery 6 weight 0 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
df[df.int_memory.isnull()]
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 117 | Nokia | Others | 5.18 | yes | no | 0.3 | 0.0 | NaN | 0.02 | 1200.0 | 86.5 | 2019 | 234 | 2.721295 | 3.688379 |
| 2035 | Nokia | Others | 5.18 | no | no | 5.0 | 0.0 | NaN | 0.03 | 1020.0 | 157.0 | 2019 | 501 | 2.343727 | 3.420346 |
| 2064 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | NaN | 0.02 | 1100.0 | 78.4 | 2015 | 559 | 2.587012 | 3.378611 |
| 2092 | Nokia | Others | 7.62 | no | no | 5.0 | 0.0 | NaN | 0.02 | 1010.0 | 100.0 | 2013 | 1043 | 3.535728 | 4.370586 |
nokiadf = df[df["brand_name"] =='Nokia'].copy()
nokiadf.head()
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 110 | Nokia | Android | 16.56 | yes | no | 0.3 | 8.0 | 64.00 | 3.00 | 4000.0 | 200.0 | 2020 | 310 | 4.186012 | 5.004885 |
| 111 | Nokia | Android | 13.84 | yes | no | 0.3 | 5.0 | 16.00 | 2.00 | 3000.0 | 180.0 | 2020 | 193 | 3.616309 | 4.100161 |
| 112 | Nokia | Android | 13.84 | yes | no | 0.3 | 5.0 | 16.00 | 2.00 | 3000.0 | 180.0 | 2020 | 356 | 4.055430 | 4.591984 |
| 113 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.10 | 0.03 | 1020.0 | 90.5 | 2020 | 272 | 2.910174 | 3.399863 |
| 114 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.06 | NaN | 1020.0 | 91.3 | 2020 | 288 | 2.729159 | 2.911263 |
nokiadf.groupby("os")["int_memory"].median()
os Android 32.00 Others 0.06 Windows 32.00 iOS NaN Name: int_memory, dtype: float64
Since the os for all missing values is Others, we can see the median value for those is .06. We will replace all missing values with .06.
Updatedf["int_memory"].fillna(.06, inplace=True)
# checking missing values in rest of the data
Updatedf.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 0 int_memory 0 ram 4 battery 6 weight 0 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
df[df.ram.isnull()]
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 114 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.06 | NaN | 1020.0 | 91.3 | 2020 | 288 | 2.729159 | 2.911263 |
| 335 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.10 | NaN | 1200.0 | 88.2 | 2020 | 327 | 3.062924 | 3.689129 |
| 2059 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.06 | NaN | NaN | 82.6 | 2016 | 1023 | 2.765060 | 3.657905 |
| 2090 | Nokia | Others | 7.62 | no | no | 5.0 | 0.0 | 0.06 | NaN | 1200.0 | 111.4 | 2013 | 1001 | 3.827771 | 4.605770 |
nokiadf.describe()
| screen_size | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 106.000000 | 106.000000 | 106.000000 | 102.000000 | 102.000000 | 104.000000 | 106.000000 | 106.000000 | 106.000000 | 106.000000 | 106.000000 |
| mean | 10.496321 | 5.839623 | 4.035849 | 25.595882 | 2.420294 | 2259.615385 | 144.883019 | 2016.396226 | 616.849057 | 3.911802 | 4.668024 |
| std | 4.404223 | 5.943146 | 5.674487 | 29.094421 | 1.889325 | 1079.475712 | 62.529011 | 2.671469 | 268.293648 | 0.676786 | 0.796993 |
| min | 5.180000 | 0.300000 | 0.000000 | 0.060000 | 0.020000 | 1000.000000 | 78.400000 | 2013.000000 | 127.000000 | 2.343727 | 2.911263 |
| 25% | 5.280000 | 2.000000 | 0.000000 | 0.100000 | 0.030000 | 1200.000000 | 100.700000 | 2014.000000 | 396.000000 | 3.424931 | 4.085808 |
| 50% | 10.290000 | 5.000000 | 1.600000 | 16.000000 | 4.000000 | 2000.000000 | 145.000000 | 2017.000000 | 621.000000 | 3.988336 | 4.604369 |
| 75% | 14.335000 | 8.000000 | 5.000000 | 32.000000 | 4.000000 | 3015.000000 | 172.000000 | 2019.000000 | 816.750000 | 4.460347 | 5.127292 |
| max | 25.430000 | 41.000000 | 24.000000 | 128.000000 | 6.000000 | 5300.000000 | 615.000000 | 2020.000000 | 1093.000000 | 5.314043 | 6.429235 |
nokiadf.groupby("os")["ram"].median()
os Android 4.00 Others 0.02 Windows 4.00 iOS NaN Name: ram, dtype: float64
Since the os for all missing values is Others, we can see the median value for those is .02. We will replace all missing values with .02.
Updatedf["ram"].fillna(.02, inplace=True)
# checking missing values in rest of the data
Updatedf.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 0 int_memory 0 ram 0 battery 6 weight 0 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
df[df.battery.isnull()]
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1829 | Meizu | Android | 12.83 | yes | no | 13.0 | 5.0 | 16.00 | 4.00 | NaN | 145.0 | 2014 | 986 | 4.177919 | 4.863604 |
| 1831 | Meizu | Android | 12.83 | yes | no | 20.7 | 5.0 | 16.00 | 4.00 | NaN | 158.0 | 2014 | 1043 | 4.878855 | 5.990639 |
| 1832 | Meizu | Android | 13.61 | yes | no | 20.7 | 2.0 | 16.00 | 4.00 | NaN | 147.0 | 2014 | 1007 | 4.742320 | 5.826118 |
| 1962 | Microsoft | Windows | 25.55 | no | no | 5.0 | 3.5 | 32.00 | 4.00 | NaN | 675.9 | 2013 | 931 | 5.230574 | 5.802753 |
| 2058 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.06 | 0.02 | NaN | 81.0 | 2016 | 815 | 2.718660 | 3.374511 |
| 2059 | Nokia | Others | 5.18 | no | no | 0.3 | 0.0 | 0.06 | NaN | NaN | 82.6 | 2016 | 1023 | 2.765060 | 3.657905 |
df.groupby("brand_name")["battery"].median()
brand_name Acer 2700.0 Alcatel 2500.0 Apple 3937.0 Asus 3300.0 BlackBerry 2840.0 Celkon 1400.0 Coolpad 2650.0 Gionee 3000.0 Google 3430.0 HTC 2600.0 Honor 3750.0 Huawei 4000.0 Infinix 5000.0 Karbonn 1625.0 LG 3000.0 Lava 2000.0 Lenovo 3450.0 Meizu 3150.0 Micromax 2000.0 Microsoft 2500.0 Motorola 3000.0 Nokia 2000.0 OnePlus 3800.0 Oppo 3600.0 Others 2500.0 Panasonic 2500.0 Realme 4200.0 Samsung 3000.0 Sony 2870.0 Spice 2100.0 Vivo 3900.0 XOLO 2100.0 Xiaomi 4000.0 ZTE 2975.0 Name: battery, dtype: float64
df.groupby("brand_name")["battery"].mean()
brand_name Acer 2999.607843 Alcatel 2452.975207 Apple 4628.910256 Asus 3499.836066 BlackBerry 2822.500000 Celkon 1475.757576 Coolpad 2945.000000 Gionee 3117.946429 Google 3628.333333 HTC 2614.454545 Honor 3690.517241 Huawei 3745.159363 Infinix 4800.000000 Karbonn 1724.137931 LG 3033.980100 Lava 2248.611111 Lenovo 3618.479532 Meizu 3368.220339 Micromax 2224.786325 Microsoft 2382.380952 Motorola 3312.594340 Nokia 2259.615385 OnePlus 3716.590909 Oppo 3380.581395 Others 2821.952191 Panasonic 2682.978723 Realme 4337.317073 Samsung 3362.683284 Sony 3003.372093 Spice 2191.000000 Vivo 3718.846154 XOLO 2220.000000 Xiaomi 3979.848485 ZTE 2894.821429 Name: battery, dtype: float64
histogram_boxplot(branddf,'battery')
There is a large difference between the mean and median battery sizes. Since the missing values are across brands and os, I think it's best to drop these 7 rows.
Updatedf.dropna(subset=["battery"], inplace=True)
# checking missing values in rest of the data
Updatedf.isnull().sum()
brand_name 0 os 0 screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 0 int_memory 0 ram 0 battery 0 weight 0 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 dtype: int64
No more missing values in the Updated dataframe.
Updatedf.describe()
| screen_size | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 | 3448.000000 |
| mean | 13.715174 | 9.383463 | 6.561978 | 54.581586 | 4.033834 | 3133.402697 | 182.580684 | 2015.967807 | 674.360499 | 4.365198 | 5.233653 |
| std | 3.797631 | 4.690235 | 6.971985 | 84.998673 | 1.368921 | 1299.682844 | 87.998376 | 2.299167 | 248.475571 | 0.587834 | 0.682677 |
| min | 5.080000 | 0.080000 | 0.000000 | 0.010000 | 0.020000 | 500.000000 | 69.000000 | 2013.000000 | 91.000000 | 1.536867 | 2.901422 |
| 25% | 12.700000 | 5.000000 | 2.000000 | 16.000000 | 4.000000 | 2100.000000 | 142.000000 | 2014.000000 | 532.000000 | 4.034019 | 4.790383 |
| 50% | 12.830000 | 8.000000 | 5.000000 | 32.000000 | 4.000000 | 3000.000000 | 160.000000 | 2016.000000 | 689.500000 | 4.405133 | 5.245892 |
| 75% | 15.340000 | 13.000000 | 8.000000 | 64.000000 | 4.000000 | 4000.000000 | 185.000000 | 2018.000000 | 868.000000 | 4.755442 | 5.672533 |
| max | 30.710000 | 48.000000 | 32.000000 | 1024.000000 | 12.000000 | 9720.000000 | 855.000000 | 2020.000000 | 1094.000000 | 6.619433 | 7.847841 |
Updatedf.head()
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Honor | Android | 14.50 | yes | no | 13.0 | 5.0 | 64.0 | 3.0 | 3020.0 | 146.0 | 2020 | 127 | 4.307572 | 4.715100 |
| 1 | Honor | Android | 17.30 | yes | yes | 13.0 | 16.0 | 128.0 | 8.0 | 4300.0 | 213.0 | 2020 | 325 | 5.162097 | 5.519018 |
| 2 | Honor | Android | 16.69 | yes | yes | 13.0 | 8.0 | 128.0 | 8.0 | 4200.0 | 213.0 | 2020 | 162 | 5.111084 | 5.884631 |
| 3 | Honor | Android | 25.50 | yes | yes | 13.0 | 8.0 | 64.0 | 6.0 | 7250.0 | 480.0 | 2020 | 345 | 5.135387 | 5.630961 |
| 4 | Honor | Android | 15.32 | yes | no | 13.0 | 8.0 | 64.0 | 3.0 | 5000.0 | 185.0 | 2020 | 293 | 4.389995 | 4.947837 |
Updatedf.brand_name.unique()
['Honor', 'Others', 'HTC', 'Huawei', 'Infinix', ..., 'Microsoft', 'Panasonic', 'Sony', 'Spice', 'XOLO'] Length: 34 Categories (34, object): ['Honor', 'Others', 'HTC', 'Huawei', ..., 'Panasonic', 'Sony', 'Spice', 'XOLO']
Updatedf.os.unique()
['Android', 'Others', 'iOS', 'Windows'] Categories (4, object): ['Android', 'Others', 'iOS', 'Windows']
Updatedf['4g'] = Updatedf['4g'].map(
{'yes':1 ,'no':0})
Updatedf.head()
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Honor | Android | 14.50 | 1 | no | 13.0 | 5.0 | 64.0 | 3.0 | 3020.0 | 146.0 | 2020 | 127 | 4.307572 | 4.715100 |
| 1 | Honor | Android | 17.30 | 1 | yes | 13.0 | 16.0 | 128.0 | 8.0 | 4300.0 | 213.0 | 2020 | 325 | 5.162097 | 5.519018 |
| 2 | Honor | Android | 16.69 | 1 | yes | 13.0 | 8.0 | 128.0 | 8.0 | 4200.0 | 213.0 | 2020 | 162 | 5.111084 | 5.884631 |
| 3 | Honor | Android | 25.50 | 1 | yes | 13.0 | 8.0 | 64.0 | 6.0 | 7250.0 | 480.0 | 2020 | 345 | 5.135387 | 5.630961 |
| 4 | Honor | Android | 15.32 | 1 | no | 13.0 | 8.0 | 64.0 | 3.0 | 5000.0 | 185.0 | 2020 | 293 | 4.389995 | 4.947837 |
Updatedf['5g'] = Updatedf['5g'].map(
{'yes':1 ,'no':0})
Updatedf.head()
| brand_name | os | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | int_memory | ram | battery | weight | release_year | days_used | normalized_used_price | normalized_new_price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Honor | Android | 14.50 | 1 | 0 | 13.0 | 5.0 | 64.0 | 3.0 | 3020.0 | 146.0 | 2020 | 127 | 4.307572 | 4.715100 |
| 1 | Honor | Android | 17.30 | 1 | 1 | 13.0 | 16.0 | 128.0 | 8.0 | 4300.0 | 213.0 | 2020 | 325 | 5.162097 | 5.519018 |
| 2 | Honor | Android | 16.69 | 1 | 1 | 13.0 | 8.0 | 128.0 | 8.0 | 4200.0 | 213.0 | 2020 | 162 | 5.111084 | 5.884631 |
| 3 | Honor | Android | 25.50 | 1 | 1 | 13.0 | 8.0 | 64.0 | 6.0 | 7250.0 | 480.0 | 2020 | 345 | 5.135387 | 5.630961 |
| 4 | Honor | Android | 15.32 | 1 | 0 | 13.0 | 8.0 | 64.0 | 3.0 | 5000.0 | 185.0 | 2020 | 293 | 4.389995 | 4.947837 |
Updatedf['4g'] = Updatedf['4g'].astype('int64')
Updatedf['5g'] = Updatedf['5g'].astype('int64')
Updatedf.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3448 entries, 0 to 3453 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 brand_name 3448 non-null category 1 os 3448 non-null category 2 screen_size 3448 non-null float64 3 4g 3448 non-null int64 4 5g 3448 non-null int64 5 main_camera_mp 3448 non-null float64 6 selfie_camera_mp 3448 non-null float64 7 int_memory 3448 non-null float64 8 ram 3448 non-null float64 9 battery 3448 non-null float64 10 weight 3448 non-null float64 11 release_year 3448 non-null int64 12 days_used 3448 non-null int64 13 normalized_used_price 3448 non-null float64 14 normalized_new_price 3448 non-null float64 dtypes: category(2), float64(9), int64(4) memory usage: 385.4 KB
Updated the 4g and 5g columns from 'yes' and 'no' to 1 and 0 for modeling.
Updatedf = pd.get_dummies(
Updatedf,
columns=Updatedf.select_dtypes(include=["object", "category"]).columns.tolist(),
drop_first=True,
)
# to ensure all variables are of float type
Updatedf = Updatedf.astype(float)
Updatedf.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3448 entries, 0 to 3453 Data columns (total 49 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 screen_size 3448 non-null float64 1 4g 3448 non-null float64 2 5g 3448 non-null float64 3 main_camera_mp 3448 non-null float64 4 selfie_camera_mp 3448 non-null float64 5 int_memory 3448 non-null float64 6 ram 3448 non-null float64 7 battery 3448 non-null float64 8 weight 3448 non-null float64 9 release_year 3448 non-null float64 10 days_used 3448 non-null float64 11 normalized_used_price 3448 non-null float64 12 normalized_new_price 3448 non-null float64 13 brand_name_Alcatel 3448 non-null float64 14 brand_name_Apple 3448 non-null float64 15 brand_name_Asus 3448 non-null float64 16 brand_name_BlackBerry 3448 non-null float64 17 brand_name_Celkon 3448 non-null float64 18 brand_name_Coolpad 3448 non-null float64 19 brand_name_Gionee 3448 non-null float64 20 brand_name_Google 3448 non-null float64 21 brand_name_HTC 3448 non-null float64 22 brand_name_Honor 3448 non-null float64 23 brand_name_Huawei 3448 non-null float64 24 brand_name_Infinix 3448 non-null float64 25 brand_name_Karbonn 3448 non-null float64 26 brand_name_LG 3448 non-null float64 27 brand_name_Lava 3448 non-null float64 28 brand_name_Lenovo 3448 non-null float64 29 brand_name_Meizu 3448 non-null float64 30 brand_name_Micromax 3448 non-null float64 31 brand_name_Microsoft 3448 non-null float64 32 brand_name_Motorola 3448 non-null float64 33 brand_name_Nokia 3448 non-null float64 34 brand_name_OnePlus 3448 non-null float64 35 brand_name_Oppo 3448 non-null float64 36 brand_name_Others 3448 non-null float64 37 brand_name_Panasonic 3448 non-null float64 38 brand_name_Realme 3448 non-null float64 39 brand_name_Samsung 3448 non-null float64 40 brand_name_Sony 3448 non-null float64 41 brand_name_Spice 3448 non-null float64 42 brand_name_Vivo 3448 non-null float64 43 brand_name_XOLO 3448 non-null float64 44 brand_name_Xiaomi 3448 non-null float64 45 brand_name_ZTE 3448 non-null float64 46 os_Others 3448 non-null float64 47 os_Windows 3448 non-null float64 48 os_iOS 3448 non-null float64 dtypes: float64(49) memory usage: 1.3 MB
Created dummy columns for all categorical columns.
histogram_boxplot(df,'screen_size')
histogram_boxplot(df,'main_camera_mp')
histogram_boxplot(df,'selfie_camera_mp')
histogram_boxplot(df,'int_memory')
histogram_boxplot(df,'ram')
histogram_boxplot(df,'battery')
histogram_boxplot(df,'release_year')
histogram_boxplot(df,'days_used')
histogram_boxplot(df,'normalized_used_price')
histogram_boxplot(df,'normalized_new_price')
There are outliers in the screen size, main camera mp, selfie camera mp, internal memory, ram, battery, weight, normalized used price, and normalized new price. I think these are valid and not worth dropping.
sns.set(font_scale = 2)
cols_to_log = ['int_memory']
for colname in cols_to_log:
plt.hist(df[colname], bins=50)
plt.title(colname)
plt.show()
print(np.sum(df[colname] <= 0))
0
sns.set(font_scale = 2)
plt.hist(np.log(df['int_memory'] + 1), 50)
plt.title('log(int_memory + 1)')
plt.show()
Updatedf['int_memory' + '_log'] = np.log(df['int_memory'] + 1)
Updatedf.drop('int_memory', axis=1, inplace=True)
Internal memory looked to be very skewed, so a log function was applied to create a closer to normal distribution.
Updatedf.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3448 entries, 0 to 3453 Data columns (total 49 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 screen_size 3448 non-null float64 1 4g 3448 non-null float64 2 5g 3448 non-null float64 3 main_camera_mp 3448 non-null float64 4 selfie_camera_mp 3448 non-null float64 5 ram 3448 non-null float64 6 battery 3448 non-null float64 7 weight 3448 non-null float64 8 release_year 3448 non-null float64 9 days_used 3448 non-null float64 10 normalized_used_price 3448 non-null float64 11 normalized_new_price 3448 non-null float64 12 brand_name_Alcatel 3448 non-null float64 13 brand_name_Apple 3448 non-null float64 14 brand_name_Asus 3448 non-null float64 15 brand_name_BlackBerry 3448 non-null float64 16 brand_name_Celkon 3448 non-null float64 17 brand_name_Coolpad 3448 non-null float64 18 brand_name_Gionee 3448 non-null float64 19 brand_name_Google 3448 non-null float64 20 brand_name_HTC 3448 non-null float64 21 brand_name_Honor 3448 non-null float64 22 brand_name_Huawei 3448 non-null float64 23 brand_name_Infinix 3448 non-null float64 24 brand_name_Karbonn 3448 non-null float64 25 brand_name_LG 3448 non-null float64 26 brand_name_Lava 3448 non-null float64 27 brand_name_Lenovo 3448 non-null float64 28 brand_name_Meizu 3448 non-null float64 29 brand_name_Micromax 3448 non-null float64 30 brand_name_Microsoft 3448 non-null float64 31 brand_name_Motorola 3448 non-null float64 32 brand_name_Nokia 3448 non-null float64 33 brand_name_OnePlus 3448 non-null float64 34 brand_name_Oppo 3448 non-null float64 35 brand_name_Others 3448 non-null float64 36 brand_name_Panasonic 3448 non-null float64 37 brand_name_Realme 3448 non-null float64 38 brand_name_Samsung 3448 non-null float64 39 brand_name_Sony 3448 non-null float64 40 brand_name_Spice 3448 non-null float64 41 brand_name_Vivo 3448 non-null float64 42 brand_name_XOLO 3448 non-null float64 43 brand_name_Xiaomi 3448 non-null float64 44 brand_name_ZTE 3448 non-null float64 45 os_Others 3448 non-null float64 46 os_Windows 3448 non-null float64 47 os_iOS 3448 non-null float64 48 int_memory_log 3444 non-null float64 dtypes: float64(49) memory usage: 1.3 MB
histogram_boxplot(Updatedf,'int_memory_log')
Updatedf.dropna(subset=["int_memory_log"], inplace=True)
Updatedf.isnull().sum()
screen_size 0 4g 0 5g 0 main_camera_mp 0 selfie_camera_mp 0 ram 0 battery 0 weight 0 release_year 0 days_used 0 normalized_used_price 0 normalized_new_price 0 brand_name_Alcatel 0 brand_name_Apple 0 brand_name_Asus 0 brand_name_BlackBerry 0 brand_name_Celkon 0 brand_name_Coolpad 0 brand_name_Gionee 0 brand_name_Google 0 brand_name_HTC 0 brand_name_Honor 0 brand_name_Huawei 0 brand_name_Infinix 0 brand_name_Karbonn 0 brand_name_LG 0 brand_name_Lava 0 brand_name_Lenovo 0 brand_name_Meizu 0 brand_name_Micromax 0 brand_name_Microsoft 0 brand_name_Motorola 0 brand_name_Nokia 0 brand_name_OnePlus 0 brand_name_Oppo 0 brand_name_Others 0 brand_name_Panasonic 0 brand_name_Realme 0 brand_name_Samsung 0 brand_name_Sony 0 brand_name_Spice 0 brand_name_Vivo 0 brand_name_XOLO 0 brand_name_Xiaomi 0 brand_name_ZTE 0 os_Others 0 os_Windows 0 os_iOS 0 int_memory_log 0 dtype: int64
No missing values left in dataset. Data is ready for another EDA and modeling.
Updatedf.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3444 entries, 0 to 3453 Data columns (total 49 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 screen_size 3444 non-null float64 1 4g 3444 non-null float64 2 5g 3444 non-null float64 3 main_camera_mp 3444 non-null float64 4 selfie_camera_mp 3444 non-null float64 5 ram 3444 non-null float64 6 battery 3444 non-null float64 7 weight 3444 non-null float64 8 release_year 3444 non-null float64 9 days_used 3444 non-null float64 10 normalized_used_price 3444 non-null float64 11 normalized_new_price 3444 non-null float64 12 brand_name_Alcatel 3444 non-null float64 13 brand_name_Apple 3444 non-null float64 14 brand_name_Asus 3444 non-null float64 15 brand_name_BlackBerry 3444 non-null float64 16 brand_name_Celkon 3444 non-null float64 17 brand_name_Coolpad 3444 non-null float64 18 brand_name_Gionee 3444 non-null float64 19 brand_name_Google 3444 non-null float64 20 brand_name_HTC 3444 non-null float64 21 brand_name_Honor 3444 non-null float64 22 brand_name_Huawei 3444 non-null float64 23 brand_name_Infinix 3444 non-null float64 24 brand_name_Karbonn 3444 non-null float64 25 brand_name_LG 3444 non-null float64 26 brand_name_Lava 3444 non-null float64 27 brand_name_Lenovo 3444 non-null float64 28 brand_name_Meizu 3444 non-null float64 29 brand_name_Micromax 3444 non-null float64 30 brand_name_Microsoft 3444 non-null float64 31 brand_name_Motorola 3444 non-null float64 32 brand_name_Nokia 3444 non-null float64 33 brand_name_OnePlus 3444 non-null float64 34 brand_name_Oppo 3444 non-null float64 35 brand_name_Others 3444 non-null float64 36 brand_name_Panasonic 3444 non-null float64 37 brand_name_Realme 3444 non-null float64 38 brand_name_Samsung 3444 non-null float64 39 brand_name_Sony 3444 non-null float64 40 brand_name_Spice 3444 non-null float64 41 brand_name_Vivo 3444 non-null float64 42 brand_name_XOLO 3444 non-null float64 43 brand_name_Xiaomi 3444 non-null float64 44 brand_name_ZTE 3444 non-null float64 45 os_Others 3444 non-null float64 46 os_Windows 3444 non-null float64 47 os_iOS 3444 non-null float64 48 int_memory_log 3444 non-null float64 dtypes: float64(49) memory usage: 1.3 MB
Updatedf.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| screen_size | 3444.0 | 13.724379 | 3.790040 | 5.080000 | 12.700000 | 12.830000 | 15.340000 | 30.710000 |
| 4g | 3444.0 | 0.676829 | 0.467755 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 |
| 5g | 3444.0 | 0.044135 | 0.205424 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| main_camera_mp | 3444.0 | 9.391283 | 4.686652 | 0.080000 | 5.000000 | 8.000000 | 13.000000 | 48.000000 |
| selfie_camera_mp | 3444.0 | 6.569599 | 6.972443 | 0.000000 | 2.000000 | 5.000000 | 8.000000 | 32.000000 |
| ram | 3444.0 | 4.038493 | 1.362866 | 0.020000 | 4.000000 | 4.000000 | 4.000000 | 12.000000 |
| battery | 3444.0 | 3135.784698 | 1298.552592 | 500.000000 | 2100.000000 | 3000.000000 | 4000.000000 | 9720.000000 |
| weight | 3444.0 | 182.670238 | 88.003967 | 69.000000 | 142.000000 | 160.000000 | 185.000000 | 855.000000 |
| release_year | 3444.0 | 2015.967189 | 2.298726 | 2013.000000 | 2014.000000 | 2016.000000 | 2018.000000 | 2020.000000 |
| days_used | 3444.0 | 674.465157 | 248.401772 | 91.000000 | 532.750000 | 690.000000 | 868.000000 | 1094.000000 |
| normalized_used_price | 3444.0 | 4.367019 | 0.585540 | 1.536867 | 4.035479 | 4.405255 | 4.756001 | 6.619433 |
| normalized_new_price | 3444.0 | 5.235418 | 0.680972 | 2.901422 | 4.790820 | 5.246024 | 5.673134 | 7.847841 |
| brand_name_Alcatel | 3444.0 | 0.035134 | 0.184144 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Apple | 3444.0 | 0.011324 | 0.105826 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Asus | 3444.0 | 0.035424 | 0.184876 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_BlackBerry | 3444.0 | 0.006388 | 0.079680 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Celkon | 3444.0 | 0.009582 | 0.097431 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Coolpad | 3444.0 | 0.006388 | 0.079680 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Gionee | 3444.0 | 0.016260 | 0.126493 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Google | 3444.0 | 0.004355 | 0.065861 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_HTC | 3444.0 | 0.031940 | 0.175865 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Honor | 3444.0 | 0.033682 | 0.180435 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Huawei | 3444.0 | 0.072880 | 0.259978 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Infinix | 3444.0 | 0.002904 | 0.053815 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Karbonn | 3444.0 | 0.008420 | 0.091389 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_LG | 3444.0 | 0.058362 | 0.234461 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Lava | 3444.0 | 0.010453 | 0.101719 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Lenovo | 3444.0 | 0.049652 | 0.217256 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Meizu | 3444.0 | 0.017131 | 0.129779 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Micromax | 3444.0 | 0.033972 | 0.181184 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Microsoft | 3444.0 | 0.006098 | 0.077860 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Motorola | 3444.0 | 0.030778 | 0.172741 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Nokia | 3444.0 | 0.029036 | 0.167932 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_OnePlus | 3444.0 | 0.006388 | 0.079680 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Oppo | 3444.0 | 0.037456 | 0.189905 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Others | 3444.0 | 0.145761 | 0.352917 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Panasonic | 3444.0 | 0.013647 | 0.116037 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Realme | 3444.0 | 0.011905 | 0.108473 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Samsung | 3444.0 | 0.099013 | 0.298723 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Sony | 3444.0 | 0.024971 | 0.156059 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Spice | 3444.0 | 0.008711 | 0.092938 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Vivo | 3444.0 | 0.033972 | 0.181184 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_XOLO | 3444.0 | 0.014228 | 0.118445 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_Xiaomi | 3444.0 | 0.038328 | 0.192014 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| brand_name_ZTE | 3444.0 | 0.040650 | 0.197508 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| os_Others | 3444.0 | 0.038037 | 0.191314 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| os_Windows | 3444.0 | 0.019164 | 0.137120 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| os_iOS | 3444.0 | 0.010453 | 0.101719 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| int_memory_log | 3444.0 | 3.542343 | 0.899108 | 0.009950 | 2.833213 | 3.496508 | 4.174387 | 6.932448 |
numeric_columns = (Updatedf.select_dtypes(include=np.number).columns.tolist())
numeric_columns.remove('release_year') # dropping year column as it is temporal variable
sns.set(font_scale = 1)
# correlation heatmap
plt.figure(figsize=(15, 7))
sns.heatmap(
Updatedf[numeric_columns].corr(),
annot=True,
vmin=-1,
vmax=1,
fmt=".2f",
cmap="Spectral",
)
plt.show()
We see a very high correlation between os IOS and Apple brand name since they only run the iOS on the Apple phones. In order to remove these highly correlative variables, we will remove the brand_name_Apple column. Next graph zooms into areas where we may have higher correlations to observe.
numeric_columns = ['screen_size', '4g', '5g', 'main_camera_mp', 'selfie_camera_mp', 'ram', 'battery', 'weight', 'days_used',
'normalized_used_price', 'normalized_new_price', 'int_memory_log', 'os_Others', 'os_Windows', 'os_iOS',
'brand_name_Apple']
sns.set(font_scale = 2)
# correlation heatmap
plt.figure(figsize=(15, 7))
sns.heatmap(
Updatedf[numeric_columns].corr(),
annot=True,
vmin=-1,
vmax=1,
fmt=".2f",
cmap="Spectral",
)
plt.show()
We see a fairly high correlation with battery and weight with screen size. I do not think we need to adjust any variables, but keep an eye on those when modeling. I will remove the brand_name_Apple column so we do not get interference with the model.
Updatedf.drop("brand_name_Apple", axis=1, inplace=True)
Updatedf.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3444 entries, 0 to 3453 Data columns (total 48 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 screen_size 3444 non-null float64 1 4g 3444 non-null float64 2 5g 3444 non-null float64 3 main_camera_mp 3444 non-null float64 4 selfie_camera_mp 3444 non-null float64 5 ram 3444 non-null float64 6 battery 3444 non-null float64 7 weight 3444 non-null float64 8 release_year 3444 non-null float64 9 days_used 3444 non-null float64 10 normalized_used_price 3444 non-null float64 11 normalized_new_price 3444 non-null float64 12 brand_name_Alcatel 3444 non-null float64 13 brand_name_Asus 3444 non-null float64 14 brand_name_BlackBerry 3444 non-null float64 15 brand_name_Celkon 3444 non-null float64 16 brand_name_Coolpad 3444 non-null float64 17 brand_name_Gionee 3444 non-null float64 18 brand_name_Google 3444 non-null float64 19 brand_name_HTC 3444 non-null float64 20 brand_name_Honor 3444 non-null float64 21 brand_name_Huawei 3444 non-null float64 22 brand_name_Infinix 3444 non-null float64 23 brand_name_Karbonn 3444 non-null float64 24 brand_name_LG 3444 non-null float64 25 brand_name_Lava 3444 non-null float64 26 brand_name_Lenovo 3444 non-null float64 27 brand_name_Meizu 3444 non-null float64 28 brand_name_Micromax 3444 non-null float64 29 brand_name_Microsoft 3444 non-null float64 30 brand_name_Motorola 3444 non-null float64 31 brand_name_Nokia 3444 non-null float64 32 brand_name_OnePlus 3444 non-null float64 33 brand_name_Oppo 3444 non-null float64 34 brand_name_Others 3444 non-null float64 35 brand_name_Panasonic 3444 non-null float64 36 brand_name_Realme 3444 non-null float64 37 brand_name_Samsung 3444 non-null float64 38 brand_name_Sony 3444 non-null float64 39 brand_name_Spice 3444 non-null float64 40 brand_name_Vivo 3444 non-null float64 41 brand_name_XOLO 3444 non-null float64 42 brand_name_Xiaomi 3444 non-null float64 43 brand_name_ZTE 3444 non-null float64 44 os_Others 3444 non-null float64 45 os_Windows 3444 non-null float64 46 os_iOS 3444 non-null float64 47 int_memory_log 3444 non-null float64 dtypes: float64(48) memory usage: 1.3 MB
sns.set(font_scale = 2)
cData_attr = Updatedf.iloc[:, 0:9]
sns.pairplot(
cData_attr, diag_kind="kde"
) # to plot density curve instead of histogram on the diag
<seaborn.axisgrid.PairGrid at 0x1e3c13ef670>
histogram_boxplot(Updatedf,'normalized_new_price')
histogram_boxplot(Updatedf,'normalized_used_price')
histogram_boxplot(Updatedf,'int_memory_log')
histogram_boxplot(Updatedf,'battery')
Data looks good to begin building the model.
sns.set(font_scale = 2)
cData_attr = Updatedf.iloc[:, 0:9]
sns.pairplot(
cData_attr, diag_kind="kde"
) # to plot density curve instead of histogram on the diag
<seaborn.axisgrid.PairGrid at 0x1e3c13efaf0>
Not all of the correlations are perfectly linear, but a linear fit would capture a good portion of the data.
# independent variables
X = Updatedf.drop(["normalized_used_price"], axis=1)
# dependent variable
y = Updatedf[["normalized_used_price"]]
X = sm.add_constant(X)
X
| const | screen_size | 4g | 5g | main_camera_mp | selfie_camera_mp | ram | battery | weight | release_year | days_used | normalized_new_price | brand_name_Alcatel | brand_name_Asus | brand_name_BlackBerry | brand_name_Celkon | brand_name_Coolpad | brand_name_Gionee | brand_name_Google | brand_name_HTC | brand_name_Honor | brand_name_Huawei | brand_name_Infinix | brand_name_Karbonn | brand_name_LG | brand_name_Lava | brand_name_Lenovo | brand_name_Meizu | brand_name_Micromax | brand_name_Microsoft | brand_name_Motorola | brand_name_Nokia | brand_name_OnePlus | brand_name_Oppo | brand_name_Others | brand_name_Panasonic | brand_name_Realme | brand_name_Samsung | brand_name_Sony | brand_name_Spice | brand_name_Vivo | brand_name_XOLO | brand_name_Xiaomi | brand_name_ZTE | os_Others | os_Windows | os_iOS | int_memory_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 14.50 | 1.0 | 0.0 | 13.0 | 5.0 | 3.0 | 3020.0 | 146.0 | 2020.0 | 127.0 | 4.715100 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.174387 |
| 1 | 1.0 | 17.30 | 1.0 | 1.0 | 13.0 | 16.0 | 8.0 | 4300.0 | 213.0 | 2020.0 | 325.0 | 5.519018 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.859812 |
| 2 | 1.0 | 16.69 | 1.0 | 1.0 | 13.0 | 8.0 | 8.0 | 4200.0 | 213.0 | 2020.0 | 162.0 | 5.884631 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.859812 |
| 3 | 1.0 | 25.50 | 1.0 | 1.0 | 13.0 | 8.0 | 6.0 | 7250.0 | 480.0 | 2020.0 | 345.0 | 5.630961 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.174387 |
| 4 | 1.0 | 15.32 | 1.0 | 0.0 | 13.0 | 8.0 | 3.0 | 5000.0 | 185.0 | 2020.0 | 293.0 | 4.947837 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.174387 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3449 | 1.0 | 15.34 | 1.0 | 0.0 | 8.0 | 8.0 | 6.0 | 5000.0 | 190.0 | 2019.0 | 232.0 | 6.483872 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.174387 |
| 3450 | 1.0 | 15.24 | 1.0 | 0.0 | 13.0 | 8.0 | 8.0 | 4000.0 | 200.0 | 2018.0 | 541.0 | 6.251538 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.859812 |
| 3451 | 1.0 | 15.80 | 1.0 | 0.0 | 13.0 | 5.0 | 3.0 | 4000.0 | 165.0 | 2020.0 | 201.0 | 4.528829 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.496508 |
| 3452 | 1.0 | 15.80 | 1.0 | 0.0 | 13.0 | 5.0 | 2.0 | 4000.0 | 160.0 | 2020.0 | 149.0 | 4.624188 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.496508 |
| 3453 | 1.0 | 12.83 | 1.0 | 0.0 | 13.0 | 5.0 | 2.0 | 4000.0 | 168.0 | 2020.0 | 176.0 | 4.279994 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.833213 |
3444 rows × 48 columns
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.30, random_state=1
)
X_train.shape
(2410, 48)
X_test.shape
(1034, 48)
print(X_train.head())
const screen_size 4g 5g main_camera_mp selfie_camera_mp ram \
1388 1.0 7.75 0.0 0.0 0.3 0.3 0.25
2662 1.0 12.83 1.0 0.0 16.0 8.0 4.00
684 1.0 12.83 1.0 0.0 13.0 5.0 4.00
1454 1.0 12.70 0.0 0.0 10.0 2.0 4.00
3333 1.0 15.37 1.0 0.0 13.0 8.0 4.00
battery weight release_year days_used normalized_new_price \
1388 1350.0 130.0 2013.0 575.0 4.096675
2662 3000.0 160.0 2016.0 991.0 5.797546
684 3000.0 167.5 2017.0 624.0 5.211233
1454 4000.0 174.0 2014.0 668.0 4.871143
3333 3750.0 188.0 2020.0 179.0 5.110420
brand_name_Alcatel brand_name_Asus brand_name_BlackBerry \
1388 0.0 0.0 0.0
2662 0.0 0.0 0.0
684 0.0 0.0 0.0
1454 0.0 0.0 0.0
3333 0.0 0.0 0.0
brand_name_Celkon brand_name_Coolpad brand_name_Gionee \
1388 0.0 0.0 0.0
2662 0.0 0.0 0.0
684 0.0 0.0 0.0
1454 0.0 0.0 0.0
3333 0.0 0.0 0.0
brand_name_Google brand_name_HTC brand_name_Honor brand_name_Huawei \
1388 0.0 0.0 0.0 1.0
2662 0.0 0.0 0.0 0.0
684 0.0 0.0 0.0 0.0
1454 0.0 0.0 0.0 0.0
3333 0.0 0.0 1.0 0.0
brand_name_Infinix brand_name_Karbonn brand_name_LG brand_name_Lava \
1388 0.0 0.0 0.0 0.0
2662 0.0 0.0 0.0 0.0
684 0.0 0.0 0.0 0.0
1454 0.0 0.0 0.0 1.0
3333 0.0 0.0 0.0 0.0
brand_name_Lenovo brand_name_Meizu brand_name_Micromax \
1388 0.0 0.0 0.0
2662 0.0 0.0 0.0
684 0.0 0.0 0.0
1454 0.0 0.0 0.0
3333 0.0 0.0 0.0
brand_name_Microsoft brand_name_Motorola brand_name_Nokia \
1388 0.0 0.0 0.0
2662 0.0 0.0 0.0
684 0.0 0.0 0.0
1454 0.0 0.0 0.0
3333 0.0 0.0 0.0
brand_name_OnePlus brand_name_Oppo brand_name_Others \
1388 0.0 0.0 0.0
2662 0.0 0.0 1.0
684 0.0 0.0 1.0
1454 0.0 0.0 0.0
3333 0.0 0.0 0.0
brand_name_Panasonic brand_name_Realme brand_name_Samsung \
1388 0.0 0.0 0.0
2662 0.0 0.0 0.0
684 0.0 0.0 0.0
1454 0.0 0.0 0.0
3333 0.0 0.0 0.0
brand_name_Sony brand_name_Spice brand_name_Vivo brand_name_XOLO \
1388 0.0 0.0 0.0 0.0
2662 0.0 0.0 0.0 0.0
684 0.0 0.0 0.0 0.0
1454 0.0 0.0 0.0 0.0
3333 0.0 0.0 0.0 0.0
brand_name_Xiaomi brand_name_ZTE os_Others os_Windows os_iOS \
1388 0.0 0.0 0.0 0.0 0.0
2662 0.0 0.0 0.0 0.0 0.0
684 0.0 0.0 0.0 0.0 0.0
1454 0.0 0.0 0.0 0.0 0.0
3333 0.0 0.0 0.0 0.0 0.0
int_memory_log
1388 5.549076
2662 3.496508
684 2.833213
1454 3.496508
3333 4.859812
print(X_test.head())
const screen_size 4g 5g main_camera_mp selfie_camera_mp ram \
1013 1.0 12.70 0.0 0.0 8.0 5.0 4.00
322 1.0 10.16 0.0 0.0 2.0 0.3 0.25
2741 1.0 10.24 1.0 0.0 20.7 2.0 4.00
2903 1.0 10.16 0.0 0.0 5.0 0.3 4.00
102 1.0 15.29 1.0 0.0 8.0 20.0 6.00
battery weight release_year days_used normalized_new_price \
1013 2000.0 150.0 2015.0 1031.0 4.628594
322 1500.0 146.5 2014.0 900.0 3.891820
2741 2300.0 137.0 2014.0 1054.0 5.772220
2903 1500.0 136.0 2013.0 625.0 4.714742
102 3600.0 165.0 2019.0 332.0 6.040659
brand_name_Alcatel brand_name_Asus brand_name_BlackBerry \
1013 0.0 0.0 0.0
322 0.0 0.0 0.0
2741 0.0 0.0 0.0
2903 0.0 0.0 0.0
102 0.0 0.0 0.0
brand_name_Celkon brand_name_Coolpad brand_name_Gionee \
1013 0.0 0.0 0.0
322 0.0 0.0 0.0
2741 0.0 0.0 0.0
2903 0.0 0.0 0.0
102 0.0 0.0 0.0
brand_name_Google brand_name_HTC brand_name_Honor brand_name_Huawei \
1013 0.0 0.0 0.0 0.0
322 0.0 0.0 0.0 0.0
2741 0.0 0.0 0.0 0.0
2903 0.0 0.0 0.0 0.0
102 0.0 0.0 0.0 0.0
brand_name_Infinix brand_name_Karbonn brand_name_LG brand_name_Lava \
1013 0.0 0.0 0.0 0.0
322 0.0 0.0 0.0 0.0
2741 0.0 0.0 0.0 0.0
2903 0.0 0.0 0.0 0.0
102 0.0 0.0 0.0 0.0
brand_name_Lenovo brand_name_Meizu brand_name_Micromax \
1013 0.0 0.0 0.0
322 0.0 0.0 1.0
2741 0.0 0.0 0.0
2903 0.0 0.0 0.0
102 0.0 1.0 0.0
brand_name_Microsoft brand_name_Motorola brand_name_Nokia \
1013 0.0 0.0 0.0
322 0.0 0.0 0.0
2741 0.0 0.0 0.0
2903 0.0 0.0 0.0
102 0.0 0.0 0.0
brand_name_OnePlus brand_name_Oppo brand_name_Others \
1013 0.0 0.0 1.0
322 0.0 0.0 0.0
2741 0.0 0.0 0.0
2903 0.0 0.0 1.0
102 0.0 0.0 0.0
brand_name_Panasonic brand_name_Realme brand_name_Samsung \
1013 0.0 0.0 0.0
322 0.0 0.0 0.0
2741 0.0 0.0 0.0
2903 0.0 0.0 0.0
102 0.0 0.0 0.0
brand_name_Sony brand_name_Spice brand_name_Vivo brand_name_XOLO \
1013 0.0 0.0 0.0 0.0
322 0.0 0.0 0.0 0.0
2741 1.0 0.0 0.0 0.0
2903 0.0 0.0 0.0 0.0
102 0.0 0.0 0.0 0.0
brand_name_Xiaomi brand_name_ZTE os_Others os_Windows os_iOS \
1013 0.0 0.0 0.0 0.0 0.0
322 0.0 0.0 0.0 0.0 0.0
2741 0.0 0.0 0.0 0.0 0.0
2903 0.0 0.0 0.0 0.0 0.0
102 0.0 0.0 0.0 0.0 0.0
int_memory_log
1013 3.496508
322 0.405465
2741 2.833213
2903 2.833213
102 4.859812
olsmod = sm.OLS(y_train, X_train)
olsres = olsmod.fit()
print(olsres.summary())
OLS Regression Results
=================================================================================
Dep. Variable: normalized_used_price R-squared: 0.846
Model: OLS Adj. R-squared: 0.843
Method: Least Squares F-statistic: 276.7
Date: Fri, 22 Apr 2022 Prob (F-statistic): 0.00
Time: 16:38:59 Log-Likelihood: 116.51
No. Observations: 2410 AIC: -137.0
Df Residuals: 2362 BIC: 140.8
Df Model: 47
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------
const -53.6411 9.240 -5.805 0.000 -71.761 -35.521
screen_size 0.0247 0.003 7.307 0.000 0.018 0.031
4g 0.0442 0.016 2.764 0.006 0.013 0.076
5g -0.0420 0.032 -1.334 0.182 -0.104 0.020
main_camera_mp 0.0208 0.001 14.388 0.000 0.018 0.024
selfie_camera_mp 0.0119 0.001 10.240 0.000 0.010 0.014
ram 0.0260 0.005 5.052 0.000 0.016 0.036
battery -1.87e-05 7.5e-06 -2.492 0.013 -3.34e-05 -3.99e-06
weight 0.0010 0.000 7.677 0.000 0.001 0.001
release_year 0.0272 0.005 5.944 0.000 0.018 0.036
days_used 8.199e-06 3.06e-05 0.268 0.789 -5.18e-05 6.82e-05
normalized_new_price 0.4284 0.012 34.921 0.000 0.404 0.452
brand_name_Alcatel -0.0444 0.046 -0.968 0.333 -0.134 0.046
brand_name_Asus 0.0137 0.046 0.298 0.765 -0.076 0.104
brand_name_BlackBerry 0.0313 0.067 0.465 0.642 -0.101 0.163
brand_name_Celkon -0.1546 0.067 -2.321 0.020 -0.285 -0.024
brand_name_Coolpad 0.0160 0.072 0.222 0.824 -0.125 0.157
brand_name_Gionee -0.0467 0.057 -0.824 0.410 -0.158 0.064
brand_name_Google 0.0631 0.091 0.691 0.490 -0.116 0.242
brand_name_HTC -0.0555 0.047 -1.181 0.238 -0.148 0.037
brand_name_Honor -0.0176 0.048 -0.368 0.713 -0.111 0.076
brand_name_Huawei -0.0444 0.043 -1.042 0.298 -0.128 0.039
brand_name_Infinix 0.1136 0.089 1.283 0.200 -0.060 0.287
brand_name_Karbonn -0.0202 0.062 -0.324 0.746 -0.143 0.102
brand_name_LG -0.0747 0.043 -1.728 0.084 -0.160 0.010
brand_name_Lava -0.0366 0.064 -0.570 0.568 -0.163 0.089
brand_name_Lenovo -0.0032 0.044 -0.072 0.942 -0.089 0.083
brand_name_Meizu -0.0457 0.054 -0.842 0.400 -0.152 0.061
brand_name_Micromax -0.0571 0.046 -1.228 0.219 -0.148 0.034
brand_name_Microsoft 0.0487 0.086 0.566 0.571 -0.120 0.217
brand_name_Motorola -0.0526 0.048 -1.100 0.272 -0.146 0.041
brand_name_Nokia 0.0563 0.050 1.122 0.262 -0.042 0.155
brand_name_OnePlus 0.0284 0.071 0.399 0.690 -0.111 0.168
brand_name_Oppo -0.0233 0.047 -0.499 0.618 -0.115 0.068
brand_name_Others -0.0452 0.040 -1.121 0.262 -0.124 0.034
brand_name_Panasonic -0.0040 0.054 -0.073 0.941 -0.110 0.102
brand_name_Realme 0.0874 0.060 1.455 0.146 -0.030 0.205
brand_name_Samsung -0.0563 0.041 -1.366 0.172 -0.137 0.025
brand_name_Sony -0.0939 0.049 -1.899 0.058 -0.191 0.003
brand_name_Spice -0.0781 0.066 -1.178 0.239 -0.208 0.052
brand_name_Vivo -0.0431 0.047 -0.915 0.361 -0.136 0.049
brand_name_XOLO -0.0286 0.056 -0.513 0.608 -0.138 0.081
brand_name_Xiaomi 0.0700 0.046 1.517 0.129 -0.020 0.161
brand_name_ZTE -0.0230 0.045 -0.507 0.612 -0.112 0.066
os_Others -0.0474 0.031 -1.508 0.132 -0.109 0.014
os_Windows -0.0371 0.043 -0.860 0.390 -0.122 0.048
os_iOS -0.1273 0.058 -2.211 0.027 -0.240 -0.014
int_memory_log 0.0144 0.008 1.899 0.058 -0.000 0.029
==============================================================================
Omnibus: 250.011 Durbin-Watson: 2.073
Prob(Omnibus): 0.000 Jarque-Bera (JB): 567.434
Skew: -0.623 Prob(JB): 6.07e-124
Kurtosis: 5.024 Cond. No. 7.68e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.68e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
First pass fit of the model with the data.
# let's check the VIF of the predictors
from statsmodels.stats.outliers_influence import variance_inflation_factor
vif_series1 = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
)
print("VIF values: \n\n{}\n".format(vif_series1))
VIF values: const 3.794178e+06 screen_size 7.174251e+00 4g 2.448213e+00 5g 1.837528e+00 main_camera_mp 2.092391e+00 selfie_camera_mp 2.942907e+00 ram 2.256441e+00 battery 4.046333e+00 weight 5.661834e+00 release_year 4.916392e+00 days_used 2.623560e+00 normalized_new_price 3.152558e+00 brand_name_Alcatel 3.220959e+00 brand_name_Asus 3.158797e+00 brand_name_BlackBerry 1.574673e+00 brand_name_Celkon 1.783562e+00 brand_name_Coolpad 1.421608e+00 brand_name_Gionee 1.869472e+00 brand_name_Google 1.226642e+00 brand_name_HTC 3.074156e+00 brand_name_Honor 3.084660e+00 brand_name_Huawei 5.380426e+00 brand_name_Infinix 1.296994e+00 brand_name_Karbonn 1.639698e+00 brand_name_LG 4.703036e+00 brand_name_Lava 1.584313e+00 brand_name_Lenovo 4.112065e+00 brand_name_Meizu 2.082950e+00 brand_name_Micromax 3.227530e+00 brand_name_Microsoft 1.762802e+00 brand_name_Motorola 3.061213e+00 brand_name_Nokia 3.243433e+00 brand_name_OnePlus 1.483652e+00 brand_name_Oppo 3.481233e+00 brand_name_Others 8.793091e+00 brand_name_Panasonic 2.058143e+00 brand_name_Realme 1.843091e+00 brand_name_Samsung 6.725934e+00 brand_name_Sony 2.593125e+00 brand_name_Spice 1.530000e+00 brand_name_Vivo 3.248885e+00 brand_name_XOLO 1.920534e+00 brand_name_Xiaomi 3.586151e+00 brand_name_ZTE 3.664081e+00 os_Others 1.660626e+00 os_Windows 1.646986e+00 os_iOS 1.929141e+00 int_memory_log 2.019516e+00 dtype: float64
Columns that have a VIF over 5 include: screen_size, weight, brand_name_Huawei, brand_name_Others, and brand_name_Samsung. I will systematically drop them to see if multicollinearity exists and if we can improve our model.
def treating_multicollinearity(predictors, target, high_vif_columns):
"""
Checking the effect of dropping the columns showing high multicollinearity
on model performance (adj. R-squared and RMSE)
predictors: independent variables
target: dependent variable
high_vif_columns: columns having high VIF
"""
# empty lists to store adj. R-squared and RMSE values
adj_r2 = []
rmse = []
# build ols models by dropping one of the high VIF columns at a time
# store the adjusted R-squared and RMSE in the lists defined previously
for cols in high_vif_columns:
# defining the new train set
train = predictors.loc[:, ~predictors.columns.str.startswith(cols)]
# create the model
olsmodel = sm.OLS(target, train).fit()
# adding adj. R-squared and RMSE to the lists
adj_r2.append(olsmodel.rsquared_adj)
rmse.append(np.sqrt(olsmodel.mse_resid))
# creating a dataframe for the results
temp = pd.DataFrame(
{
"col": high_vif_columns,
"Adj. R-squared after_dropping col": adj_r2,
"RMSE after dropping col": rmse,
}
).sort_values(by="Adj. R-squared after_dropping col", ascending=False)
temp.reset_index(drop=True, inplace=True)
return temp
from statsmodels.stats.outliers_influence import variance_inflation_factor
# we will define a function to check VIF
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
# calculating VIF for each feature
vif["VIF"] = [
variance_inflation_factor(predictors.values, i)
for i in range(len(predictors.columns))
]
return vif
col_list = [
"screen_size",
"weight",
"brand_name_Huawei",
"brand_name_Others",
"brand_name_Samsung",
]
res = treating_multicollinearity(X_train, y_train, col_list)
res
| col | Adj. R-squared after_dropping col | RMSE after dropping col | |
|---|---|---|---|
| 0 | brand_name_Huawei | 0.843218 | 0.232887 |
| 1 | brand_name_Others | 0.843207 | 0.232895 |
| 2 | brand_name_Samsung | 0.843166 | 0.232925 |
| 3 | screen_size | 0.839748 | 0.235450 |
| 4 | weight | 0.839380 | 0.235720 |
col_to_drop = "brand_name_Others"
X_train2 = X_train.loc[:, ~X_train.columns.str.startswith(col_to_drop)]
X_test2 = X_test.loc[:, ~X_test.columns.str.startswith(col_to_drop)]
# Check VIF now
vif = checking_vif(X_train2)
print("VIF after dropping ", col_to_drop)
vif
VIF after dropping brand_name_Others
| feature | VIF | |
|---|---|---|
| 0 | const | 3.793359e+06 |
| 1 | screen_size | 7.116506e+00 |
| 2 | 4g | 2.447674e+00 |
| 3 | 5g | 1.836527e+00 |
| 4 | main_camera_mp | 2.092348e+00 |
| 5 | selfie_camera_mp | 2.940768e+00 |
| 6 | ram | 2.253569e+00 |
| 7 | battery | 4.043940e+00 |
| 8 | weight | 5.658221e+00 |
| 9 | release_year | 4.914602e+00 |
| 10 | days_used | 2.623235e+00 |
| 11 | normalized_new_price | 3.150880e+00 |
| 12 | brand_name_Alcatel | 1.224995e+00 |
| 13 | brand_name_Asus | 1.205557e+00 |
| 14 | brand_name_BlackBerry | 1.130325e+00 |
| 15 | brand_name_Celkon | 1.236593e+00 |
| 16 | brand_name_Coolpad | 1.049609e+00 |
| 17 | brand_name_Gionee | 1.091867e+00 |
| 18 | brand_name_Google | 1.034049e+00 |
| 19 | brand_name_HTC | 1.220144e+00 |
| 20 | brand_name_Honor | 1.282118e+00 |
| 21 | brand_name_Huawei | 1.475672e+00 |
| 22 | brand_name_Infinix | 1.070747e+00 |
| 23 | brand_name_Karbonn | 1.085278e+00 |
| 24 | brand_name_LG | 1.364427e+00 |
| 25 | brand_name_Lava | 1.066525e+00 |
| 26 | brand_name_Lenovo | 1.294481e+00 |
| 27 | brand_name_Meizu | 1.131464e+00 |
| 28 | brand_name_Micromax | 1.229721e+00 |
| 29 | brand_name_Microsoft | 1.446401e+00 |
| 30 | brand_name_Motorola | 1.247909e+00 |
| 31 | brand_name_Nokia | 1.547098e+00 |
| 32 | brand_name_OnePlus | 1.100561e+00 |
| 33 | brand_name_Oppo | 1.338642e+00 |
| 34 | brand_name_Panasonic | 1.108561e+00 |
| 35 | brand_name_Realme | 1.153888e+00 |
| 36 | brand_name_Samsung | 1.561249e+00 |
| 37 | brand_name_Sony | 1.192665e+00 |
| 38 | brand_name_Spice | 1.069790e+00 |
| 39 | brand_name_Vivo | 1.303555e+00 |
| 40 | brand_name_XOLO | 1.106539e+00 |
| 41 | brand_name_Xiaomi | 1.328202e+00 |
| 42 | brand_name_ZTE | 1.273016e+00 |
| 43 | os_Others | 1.648751e+00 |
| 44 | os_Windows | 1.646932e+00 |
| 45 | os_iOS | 1.189540e+00 |
| 46 | int_memory_log | 2.018570e+00 |
col_to_drop = "screen_size"
X_train3 = X_train2.loc[:, ~X_train2.columns.str.startswith(col_to_drop)]
X_test3 = X_test2.loc[:, ~X_test2.columns.str.startswith(col_to_drop)]
# Check VIF now
vif = checking_vif(X_train3)
print("VIF after dropping ", col_to_drop)
vif
VIF after dropping screen_size
| feature | VIF | |
|---|---|---|
| 0 | const | 3.688556e+06 |
| 1 | 4g | 2.447394e+00 |
| 2 | 5g | 1.831502e+00 |
| 3 | main_camera_mp | 2.091480e+00 |
| 4 | selfie_camera_mp | 2.938961e+00 |
| 5 | ram | 2.253221e+00 |
| 6 | battery | 3.699069e+00 |
| 7 | weight | 2.840565e+00 |
| 8 | release_year | 4.775796e+00 |
| 9 | days_used | 2.609256e+00 |
| 10 | normalized_new_price | 3.094935e+00 |
| 11 | brand_name_Alcatel | 1.208880e+00 |
| 12 | brand_name_Asus | 1.204467e+00 |
| 13 | brand_name_BlackBerry | 1.130294e+00 |
| 14 | brand_name_Celkon | 1.236043e+00 |
| 15 | brand_name_Coolpad | 1.049554e+00 |
| 16 | brand_name_Gionee | 1.091344e+00 |
| 17 | brand_name_Google | 1.033948e+00 |
| 18 | brand_name_HTC | 1.219735e+00 |
| 19 | brand_name_Honor | 1.268269e+00 |
| 20 | brand_name_Huawei | 1.468750e+00 |
| 21 | brand_name_Infinix | 1.070674e+00 |
| 22 | brand_name_Karbonn | 1.079545e+00 |
| 23 | brand_name_LG | 1.364332e+00 |
| 24 | brand_name_Lava | 1.065016e+00 |
| 25 | brand_name_Lenovo | 1.289505e+00 |
| 26 | brand_name_Meizu | 1.131301e+00 |
| 27 | brand_name_Micromax | 1.226932e+00 |
| 28 | brand_name_Microsoft | 1.446399e+00 |
| 29 | brand_name_Motorola | 1.247740e+00 |
| 30 | brand_name_Nokia | 1.546101e+00 |
| 31 | brand_name_OnePlus | 1.098913e+00 |
| 32 | brand_name_Oppo | 1.337704e+00 |
| 33 | brand_name_Panasonic | 1.107731e+00 |
| 34 | brand_name_Realme | 1.153645e+00 |
| 35 | brand_name_Samsung | 1.559530e+00 |
| 36 | brand_name_Sony | 1.192581e+00 |
| 37 | brand_name_Spice | 1.068489e+00 |
| 38 | brand_name_Vivo | 1.296793e+00 |
| 39 | brand_name_XOLO | 1.103514e+00 |
| 40 | brand_name_Xiaomi | 1.327229e+00 |
| 41 | brand_name_ZTE | 1.271369e+00 |
| 42 | os_Others | 1.448132e+00 |
| 43 | os_Windows | 1.646640e+00 |
| 44 | os_iOS | 1.186887e+00 |
| 45 | int_memory_log | 2.014477e+00 |
Dropping brand_name_Others and screen_size removes the multicollinearity in the dataset. With multicollinearity removed, we can reliably review the p-values in the dataset and remove unneeded predictor variables.
olsmod_2 = sm.OLS(y_train, X_train3)
olsres_2 = olsmod_2.fit()
print(olsres_2.summary())
OLS Regression Results
=================================================================================
Dep. Variable: normalized_used_price R-squared: 0.843
Model: OLS Adj. R-squared: 0.840
Method: Least Squares F-statistic: 281.2
Date: Fri, 22 Apr 2022 Prob (F-statistic): 0.00
Time: 16:44:18 Log-Likelihood: 87.983
No. Observations: 2410 AIC: -83.97
Df Residuals: 2364 BIC: 182.3
Df Model: 45
Covariance Type: nonrobust
=========================================================================================
coef std err t P>|t| [0.025 0.975]
-----------------------------------------------------------------------------------------
const -64.9112 9.215 -7.044 0.000 -82.982 -46.840
4g 0.0426 0.016 2.638 0.008 0.011 0.074
5g -0.0551 0.032 -1.732 0.083 -0.117 0.007
main_camera_mp 0.0210 0.001 14.373 0.000 0.018 0.024
selfie_camera_mp 0.0120 0.001 10.283 0.000 0.010 0.014
ram 0.0267 0.005 5.130 0.000 0.016 0.037
battery -2.614e-06 7.26e-06 -0.360 0.719 -1.68e-05 1.16e-05
weight 0.0017 9.56e-05 18.000 0.000 0.002 0.002
release_year 0.0328 0.005 7.195 0.000 0.024 0.042
days_used -8.794e-06 3.09e-05 -0.285 0.776 -6.93e-05 5.17e-05
normalized_new_price 0.4409 0.012 35.859 0.000 0.417 0.465
brand_name_Alcatel 0.0202 0.028 0.710 0.478 -0.036 0.076
brand_name_Asus 0.0606 0.029 2.110 0.035 0.004 0.117
brand_name_BlackBerry 0.0692 0.058 1.199 0.231 -0.044 0.182
brand_name_Celkon -0.1046 0.056 -1.864 0.062 -0.215 0.005
brand_name_Coolpad 0.0605 0.063 0.968 0.333 -0.062 0.183
brand_name_Gionee -0.0128 0.044 -0.292 0.771 -0.099 0.073
brand_name_Google 0.0975 0.085 1.149 0.251 -0.069 0.264
brand_name_HTC -0.0105 0.030 -0.352 0.725 -0.069 0.048
brand_name_Honor 0.0470 0.031 1.521 0.128 -0.014 0.108
brand_name_Huawei 0.0077 0.023 0.340 0.734 -0.037 0.052
brand_name_Infinix 0.1600 0.081 1.965 0.049 0.000 0.320
brand_name_Karbonn 0.0480 0.051 0.936 0.350 -0.053 0.149
brand_name_LG -0.0324 0.024 -1.377 0.169 -0.079 0.014
brand_name_Lava 0.0193 0.053 0.362 0.718 -0.085 0.124
brand_name_Lenovo 0.0487 0.025 1.967 0.049 0.000 0.097
brand_name_Meizu -0.0010 0.040 -0.025 0.980 -0.080 0.078
brand_name_Micromax -0.0059 0.029 -0.205 0.838 -0.063 0.051
brand_name_Microsoft 0.0889 0.079 1.128 0.259 -0.066 0.243
brand_name_Motorola -0.0140 0.031 -0.452 0.651 -0.074 0.047
brand_name_Nokia 0.0904 0.035 2.581 0.010 0.022 0.159
brand_name_OnePlus 0.0865 0.062 1.397 0.162 -0.035 0.208
brand_name_Oppo 0.0234 0.029 0.801 0.423 -0.034 0.081
brand_name_Panasonic 0.0452 0.040 1.128 0.259 -0.033 0.124
brand_name_Realme 0.1338 0.048 2.781 0.005 0.039 0.228
brand_name_Samsung -0.0109 0.020 -0.543 0.587 -0.050 0.028
brand_name_Sony -0.0511 0.034 -1.506 0.132 -0.118 0.015
brand_name_Spice -0.0230 0.056 -0.409 0.682 -0.133 0.087
brand_name_Vivo 0.0138 0.030 0.457 0.648 -0.045 0.073
brand_name_XOLO 0.0285 0.043 0.668 0.504 -0.055 0.112
brand_name_Xiaomi 0.1167 0.028 4.110 0.000 0.061 0.172
brand_name_ZTE 0.0252 0.027 0.934 0.350 -0.028 0.078
os_Others -0.1256 0.030 -4.233 0.000 -0.184 -0.067
os_Windows -0.0325 0.044 -0.746 0.456 -0.118 0.053
os_iOS -0.1032 0.046 -2.259 0.024 -0.193 -0.014
int_memory_log 0.0117 0.008 1.525 0.127 -0.003 0.027
==============================================================================
Omnibus: 249.583 Durbin-Watson: 2.088
Prob(Omnibus): 0.000 Jarque-Bera (JB): 536.525
Skew: -0.639 Prob(JB): 3.13e-117
Kurtosis: 4.926 Cond. No. 7.57e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 7.57e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
X_train3.head()
| const | 4g | 5g | main_camera_mp | selfie_camera_mp | ram | battery | weight | release_year | days_used | normalized_new_price | brand_name_Alcatel | brand_name_Asus | brand_name_BlackBerry | brand_name_Celkon | brand_name_Coolpad | brand_name_Gionee | brand_name_Google | brand_name_HTC | brand_name_Honor | brand_name_Huawei | brand_name_Infinix | brand_name_Karbonn | brand_name_LG | brand_name_Lava | brand_name_Lenovo | brand_name_Meizu | brand_name_Micromax | brand_name_Microsoft | brand_name_Motorola | brand_name_Nokia | brand_name_OnePlus | brand_name_Oppo | brand_name_Panasonic | brand_name_Realme | brand_name_Samsung | brand_name_Sony | brand_name_Spice | brand_name_Vivo | brand_name_XOLO | brand_name_Xiaomi | brand_name_ZTE | os_Others | os_Windows | os_iOS | int_memory_log | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1388 | 1.0 | 0.0 | 0.0 | 0.3 | 0.3 | 0.25 | 1350.0 | 130.0 | 2013.0 | 575.0 | 4.096675 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 5.549076 |
| 2662 | 1.0 | 1.0 | 0.0 | 16.0 | 8.0 | 4.00 | 3000.0 | 160.0 | 2016.0 | 991.0 | 5.797546 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.496508 |
| 684 | 1.0 | 1.0 | 0.0 | 13.0 | 5.0 | 4.00 | 3000.0 | 167.5 | 2017.0 | 624.0 | 5.211233 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.833213 |
| 1454 | 1.0 | 0.0 | 0.0 | 10.0 | 2.0 | 4.00 | 4000.0 | 174.0 | 2014.0 | 668.0 | 4.871143 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.496508 |
| 3333 | 1.0 | 1.0 | 0.0 | 13.0 | 8.0 | 4.00 | 3750.0 | 188.0 | 2020.0 | 179.0 | 5.110420 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 4.859812 |
If the p-value is larger than 0.05 (our 95% confidence interval), we can remove it since it is not significant. The following columns satisfy that removal: 5g, battery, days_used, brand_name_Alcatel, brand_name_BlackBerry, brand_name_Celkon, brand_name_Coolpad, brand_name_Gionee, brand_name_Google, brand_name_HTC, brand_name_Honor, brand_name_Huawei, brand_name_Infinix, brand_name_Karbonn, brand_name_LG, brand_name_Lava, brand_name_Meizu, brand_name_Micromax, brand_name_Microsoft, brand_name_Motorola, brand_name_OnePlus, brand_name_Oppo, brand_name_Panasonic, brand_name_Samsung, brand_name_Sony, brand_name_Spice, brand_name_Vivo, brand_name_XOLO, brand_name_ZTE, os_Windows, int_memory_log.
X_train8 = X_train3.drop(["5g", "battery", "days_used", "brand_name_Alcatel", "brand_name_BlackBerry",
"brand_name_Celkon", "brand_name_Coolpad", "brand_name_Gionee", "brand_name_Google",
"brand_name_HTC", "brand_name_Honor", "brand_name_Huawei","brand_name_Infinix",
"brand_name_Karbonn", "brand_name_LG", "brand_name_Lava", "brand_name_Meizu",
"brand_name_Micromax", "brand_name_Microsoft", "brand_name_Motorola", "brand_name_OnePlus",
"brand_name_Oppo", "brand_name_Panasonic", "brand_name_Samsung", "brand_name_Sony",
"brand_name_Spice", "brand_name_Vivo", "brand_name_XOLO", "brand_name_ZTE",
"os_Windows", "int_memory_log"], axis=1)
olsmod_10 = sm.OLS(y_train, X_train8)
olsres_10 = olsmod_10.fit()
print(olsres_10.summary())
OLS Regression Results
=================================================================================
Dep. Variable: normalized_used_price R-squared: 0.840
Model: OLS Adj. R-squared: 0.839
Method: Least Squares F-statistic: 900.6
Date: Fri, 22 Apr 2022 Prob (F-statistic): 0.00
Time: 16:46:07 Log-Likelihood: 70.975
No. Observations: 2410 AIC: -111.9
Df Residuals: 2395 BIC: -25.14
Df Model: 14
Covariance Type: nonrobust
========================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------
const -69.6731 6.774 -10.285 0.000 -82.957 -56.389
4g 0.0399 0.015 2.641 0.008 0.010 0.069
main_camera_mp 0.0213 0.001 15.824 0.000 0.019 0.024
selfie_camera_mp 0.0127 0.001 11.625 0.000 0.011 0.015
ram 0.0242 0.005 5.321 0.000 0.015 0.033
weight 0.0017 6.41e-05 26.450 0.000 0.002 0.002
release_year 0.0352 0.003 10.486 0.000 0.029 0.042
normalized_new_price 0.4386 0.011 39.407 0.000 0.417 0.460
brand_name_Asus 0.0586 0.026 2.217 0.027 0.007 0.110
brand_name_Lenovo 0.0460 0.022 2.076 0.038 0.003 0.089
brand_name_Nokia 0.0665 0.030 2.242 0.025 0.008 0.125
brand_name_Realme 0.1145 0.046 2.500 0.012 0.025 0.204
brand_name_Xiaomi 0.1074 0.025 4.261 0.000 0.058 0.157
os_Others -0.1325 0.028 -4.785 0.000 -0.187 -0.078
os_iOS -0.1000 0.044 -2.280 0.023 -0.186 -0.014
==============================================================================
Omnibus: 254.855 Durbin-Watson: 2.077
Prob(Omnibus): 0.000 Jarque-Bera (JB): 559.093
Skew: -0.644 Prob(JB): 3.93e-122
Kurtosis: 4.977 Cond. No. 2.86e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.86e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
After dropping features with multicollinearity and the statistically insignificant, the model performance has not dropped significantly (.846 to.840)
These assumptions are essential conditions that should be met before we draw inferences regarding the model estimates or use the model to make a prediction.
For Linear Regression, we need to check if the following assumptions hold:-
-Linearity -Independence -Homoscedasticity -Normality of error terms -No strong Multicollinearity
df_pred = pd.DataFrame()
df_pred["Actual Values"] = y_train.values.flatten() # actual values
df_pred["Fitted Values"] = olsres_10.fittedvalues.values # predicted values
df_pred["Residuals"] = olsres_10.resid.values # residuals
df_pred.head()
| Actual Values | Fitted Values | Residuals | |
|---|---|---|---|
| 0 | 3.028199 | 3.277840 | -0.249641 |
| 1 | 4.487062 | 4.743878 | -0.256816 |
| 2 | 4.616209 | 4.432553 | 0.183656 |
| 3 | 3.926517 | 4.046723 | -0.120206 |
| 4 | 4.558288 | 4.566874 | -0.008585 |
# let us plot the fitted values vs residuals
sns.set(font_scale = 2)
sns.set_style("whitegrid")
sns.residplot(
data=df_pred, x="Fitted Values", y="Residuals", color="purple", lowess=True
)
plt.xlabel("Fitted Values")
plt.ylabel("Residuals")
plt.title("Fitted vs Residual plot")
plt.show()
There does not appear to be a significant pattern to the residuals, data points seem randomly distributed.
sns.set(font_scale = 2)
sns.histplot(df_pred["Residuals"], kde=True)
plt.title("Normality of residuals")
plt.show()
import pylab
import scipy.stats as stats
sns.set(font_scale = 2)
stats.probplot(df_pred["Residuals"], dist="norm", plot=pylab)
plt.show()
stats.shapiro(df_pred["Residuals"])
ShapiroResult(statistic=0.9679368138313293, pvalue=9.1132493432497e-23)
Since p-value < 0.05, the residuals are not normal as per shapiro test. As an approximation, we can accept this distribution as close to being normal.
import statsmodels.stats.api as sms
from statsmodels.compat import lzip
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(df_pred["Residuals"], X_train8)
lzip(name, test)
[('F statistic', 1.090391531249412), ('p-value', 0.06784204757383994)]
Since p-value > 0.05 we can say that the residuals are homoscedastic.
This satisfies all the assumptions of linear regression, time to review the final model.
print(olsres_10.summary())
OLS Regression Results
=================================================================================
Dep. Variable: normalized_used_price R-squared: 0.840
Model: OLS Adj. R-squared: 0.839
Method: Least Squares F-statistic: 900.6
Date: Fri, 22 Apr 2022 Prob (F-statistic): 0.00
Time: 16:50:23 Log-Likelihood: 70.975
No. Observations: 2410 AIC: -111.9
Df Residuals: 2395 BIC: -25.14
Df Model: 14
Covariance Type: nonrobust
========================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------
const -69.6731 6.774 -10.285 0.000 -82.957 -56.389
4g 0.0399 0.015 2.641 0.008 0.010 0.069
main_camera_mp 0.0213 0.001 15.824 0.000 0.019 0.024
selfie_camera_mp 0.0127 0.001 11.625 0.000 0.011 0.015
ram 0.0242 0.005 5.321 0.000 0.015 0.033
weight 0.0017 6.41e-05 26.450 0.000 0.002 0.002
release_year 0.0352 0.003 10.486 0.000 0.029 0.042
normalized_new_price 0.4386 0.011 39.407 0.000 0.417 0.460
brand_name_Asus 0.0586 0.026 2.217 0.027 0.007 0.110
brand_name_Lenovo 0.0460 0.022 2.076 0.038 0.003 0.089
brand_name_Nokia 0.0665 0.030 2.242 0.025 0.008 0.125
brand_name_Realme 0.1145 0.046 2.500 0.012 0.025 0.204
brand_name_Xiaomi 0.1074 0.025 4.261 0.000 0.058 0.157
os_Others -0.1325 0.028 -4.785 0.000 -0.187 -0.078
os_iOS -0.1000 0.044 -2.280 0.023 -0.186 -0.014
==============================================================================
Omnibus: 254.855 Durbin-Watson: 2.077
Prob(Omnibus): 0.000 Jarque-Bera (JB): 559.093
Skew: -0.644 Prob(JB): 3.93e-122
Kurtosis: 4.977 Cond. No. 2.86e+06
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.86e+06. This might indicate that there are
strong multicollinearity or other numerical problems.
olsres_10.params
const -69.673139 4g 0.039878 main_camera_mp 0.021344 selfie_camera_mp 0.012698 ram 0.024208 weight 0.001695 release_year 0.035230 normalized_new_price 0.438575 brand_name_Asus 0.058597 brand_name_Lenovo 0.045969 brand_name_Nokia 0.066496 brand_name_Realme 0.114512 brand_name_Xiaomi 0.107353 os_Others -0.132485 os_iOS -0.099976 dtype: float64
X_train8.columns
Index(['const', '4g', 'main_camera_mp', 'selfie_camera_mp', 'ram', 'weight',
'release_year', 'normalized_new_price', 'brand_name_Asus',
'brand_name_Lenovo', 'brand_name_Nokia', 'brand_name_Realme',
'brand_name_Xiaomi', 'os_Others', 'os_iOS'],
dtype='object')
X_test.columns
Index(['const', 'screen_size', '4g', '5g', 'main_camera_mp',
'selfie_camera_mp', 'ram', 'battery', 'weight', 'release_year',
'days_used', 'normalized_new_price', 'brand_name_Alcatel',
'brand_name_Asus', 'brand_name_BlackBerry', 'brand_name_Celkon',
'brand_name_Coolpad', 'brand_name_Gionee', 'brand_name_Google',
'brand_name_HTC', 'brand_name_Honor', 'brand_name_Huawei',
'brand_name_Infinix', 'brand_name_Karbonn', 'brand_name_LG',
'brand_name_Lava', 'brand_name_Lenovo', 'brand_name_Meizu',
'brand_name_Micromax', 'brand_name_Microsoft', 'brand_name_Motorola',
'brand_name_Nokia', 'brand_name_OnePlus', 'brand_name_Oppo',
'brand_name_Others', 'brand_name_Panasonic', 'brand_name_Realme',
'brand_name_Samsung', 'brand_name_Sony', 'brand_name_Spice',
'brand_name_Vivo', 'brand_name_XOLO', 'brand_name_Xiaomi',
'brand_name_ZTE', 'os_Others', 'os_Windows', 'os_iOS',
'int_memory_log'],
dtype='object')
# dropping columns from the test data that are not there in the training data
X_test2 = X_test.drop(
[ 'screen_size', '5g', 'battery',
'days_used', 'brand_name_Alcatel',
'brand_name_BlackBerry', 'brand_name_Celkon',
'brand_name_Coolpad', 'brand_name_Gionee', 'brand_name_Google',
'brand_name_HTC', 'brand_name_Honor', 'brand_name_Huawei',
'brand_name_Infinix', 'brand_name_Karbonn', 'brand_name_LG',
'brand_name_Lava', 'brand_name_Meizu',
'brand_name_Micromax', 'brand_name_Microsoft', 'brand_name_Motorola',
'brand_name_OnePlus', 'brand_name_Oppo',
'brand_name_Others', 'brand_name_Panasonic',
'brand_name_Samsung', 'brand_name_Sony', 'brand_name_Spice',
'brand_name_Vivo', 'brand_name_XOLO',
'brand_name_ZTE', 'os_Windows',
'int_memory_log'], axis=1
)
# let's make predictions on the test set
y_pred = olsres_10.predict(X_test2)
# let's check the RMSE on the train data
rmse1 = np.sqrt(mean_squared_error(y_train, df_pred["Fitted Values"]))
rmse1
0.23494856735193043
# let's check the RMSE on the test data
rmse2 = np.sqrt(mean_squared_error(y_test, y_pred))
rmse2
0.2371701947347765
# let's check the MAE on the train data
mae1 = mean_absolute_error(y_train, df_pred["Fitted Values"])
mae1
0.18284538784636892
# let's check the MAE on the test data
mae2 = mean_absolute_error(y_test, y_pred)
mae2
0.18472572202588955
We can see that RMSE on the train and test sets are comparable. So, our model is not suffering from overfitting. MAE indicates that our current model is able to predict normalized used phone price within a mean error of 0.18 units on the test data. Hence, we can conclude the model "ols_res10" is good for prediction as well as inference purposes.
-
# Let us write the equation of linear regression
Equation = "normalized used price ="
print(Equation, end=" ")
for i in range(len(X_train8.columns)):
if i == 0:
print(olsres_10.params[i], "+", end=" ")
elif i != len(X_train8.columns) - 1:
print(
olsres_10.params[i],
"* (",
X_train8.columns[i],
")",
"+",
end=" ",
)
else:
print(olsres_10.params[i], "* (", X_train8.columns[i], ")")
normalized used price = -69.67313932569556 + 0.039877972977253644 * ( 4g ) + 0.02134395159749451 * ( main_camera_mp ) + 0.012698102126027878 * ( selfie_camera_mp ) + 0.024208105729143115 * ( ram ) + 0.0016951490000041145 * ( weight ) + 0.03522982831492093 * ( release_year ) + 0.43857548521714096 * ( normalized_new_price ) + 0.058596637911585814 * ( brand_name_Asus ) + 0.045968934567268166 * ( brand_name_Lenovo ) + 0.06649610118659377 * ( brand_name_Nokia ) + 0.11451237879725928 * ( brand_name_Realme ) + 0.10735308947625927 * ( brand_name_Xiaomi ) + -0.13248513068895654 * ( os_Others ) + -0.09997579613975094 * ( os_iOS )
Our model statistically shows that the most relevant variables when calculating used phone prices are as follows: 4g capabilities, main camera megapixels, selfie camera megapixels, RAM, weight, release year, normalized new price, brand names (Asus, Lenovo, Nokia, Realme, and Xiami), operating systems with Others and Apple.